[Bacula-users] Network errors during backups

Hi all,

I have a setup when Bacula Director is hosted on AWS while one of bacula clients is hosted elsewhere and I quite

often see the errors like this (for backup jobs):

2014-03-17 07:25:04 XXX-sd JobId 1179: Recycled volume "XXX_pool_0255" on device "FileStorage5" (/mnt/backups), all previous data lost.

2014-03-17 07:25:04 XXX-dir JobId 1179: Volume used once. Marking Volume "XXX_pool_0255" as Used.

2014-03-17 07:41:06 XXX-sd JobId 1179: Fatal error: append.c:161 Error reading data header from FD. ERR=Connection timed out

2014-03-17 07:41:06 XXX-sd JobId 1179: Job write elapsed time = 00:16:02, Transfer rate = 0 Bytes/second

2014-03-17 08:41:36 XXX-dir JobId 1179: Fatal error: Network error with FD during Backup: ERR=Connection timed out

or like this (for verify jobs):

2014-03-16 13:10:50 XXX-dir JobId 1154: Start Verify JobId=1154 Level=VolumeToCatalog Job=XXX_verify.2014-03-16_07.00.00_18

2014-03-16 13:10:50 XXX-dir JobId 1154: Using Device "FileStorage5"

2014-03-16 13:38:52 XXX-sd JobId 1154: Ready to read from volume "XXX_pool_0248" on device "FileStorage5" (/mnt/backups).

2014-03-16 15:49:18 XXX-sd JobId 1154: End of Volume at file 12 on device "FileStorage5" (/mnt/backups), Volume "hondaextranet.ru_pool_0248"

2014-03-16 16:01:42 XXX-sd JobId 1154: Ready to read from volume "XXX_pool_0252" on device "FileStorage5" (/mnt/backups).

2014-03-16 16:10:36 XXX-dir JobId 1154: Fatal error: verify.c:758 bdird

2014-03-16 16:10:36 XXX-dir JobId 1154: Fatal error: Network error with FD during Verify: ERR=Connection reset by peer

2014-03-16 16:10:36 XXX-dir JobId 1154: Fatal error: No Job status returned from FD.

or like this (for verify jobs):

2014-03-16 16:27:14 XXX-sd JobId 1155: Ready to read from volume "XXX_pool_0248" on device "FileStorage5" (/mnt/backups).

2014-03-17 03:10:31 XXX-dir JobId 1155: Fatal error: verify.c:758 bdird

2014-03-17 03:10:31 XXX-dir JobId 1155: Fatal error: Network error with FD during Verify: ERR=Connection timed out

The backups for this client are quite big (~70Gb which are split into 2 volumes) and transfer rate is like 3-4Mb/s and full backup job takes like 6-7 hours to complete.

Sometimes both jobs complete ok but quite often we meet errors like the above which I think are caused by some kind of network outages. Heartbeat intervals are set to 60 on all of Dir, SD and FD.

Is there a way to deal with such kind of problems?

Thanks,

Timur

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech

_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users