Re: [Bacula-users] Network errors during backups
2014-03-17 07:28:25
On 3/17/2014 6:04 AM, Timur Batyrshin wrote:
> Hi all,
>
> I have a setup when Bacula Director is hosted on AWS while one of
> bacula clients is hosted elsewhere and I quite
> often see the errors like this (for backup jobs):
> 2014-03-17 07:25:04 XXX-sd JobId 1179: Recycled volume
> "XXX_pool_0255" on device "FileStorage5" (/mnt/backups), all previous
> data lost.
> 2014-03-17 07:25:04 XXX-dir JobId 1179: Volume used once. Marking
> Volume "XXX_pool_0255" as Used.
> 2014-03-17 07:41:06 XXX-sd JobId 1179: Fatal error: append.c:161
> Error reading data header from FD. ERR=Connection timed out
> 2014-03-17 07:41:06 XXX-sd JobId 1179: Job write elapsed time =
> 00:16:02, Transfer rate = 0 Bytes/second
> 2014-03-17 08:41:36 XXX-dir JobId 1179: Fatal error: Network error
> with FD during Backup: ERR=Connection timed out
>
> or like this (for verify jobs):
> 2014-03-16 13:10:50 XXX-dir JobId 1154: Start Verify JobId=1154
> Level=VolumeToCatalog Job=XXX_verify.2014-03-16_07.00.00_18
> 2014-03-16 13:10:50 XXX-dir JobId 1154: Using Device "FileStorage5"
> 2014-03-16 13:38:52 XXX-sd JobId 1154: Ready to read from volume
> "XXX_pool_0248" on device "FileStorage5" (/mnt/backups).
> 2014-03-16 15:49:18 XXX-sd JobId 1154: End of Volume at file 12 on
> device "FileStorage5" (/mnt/backups), Volume "hondaextranet.ru_pool_0248"
> 2014-03-16 16:01:42 XXX-sd JobId 1154: Ready to read from volume
> "XXX_pool_0252" on device "FileStorage5" (/mnt/backups).
> 2014-03-16 16:10:36 XXX-dir JobId 1154: Fatal error: verify.c:758 bdird
> 2014-03-16 16:10:36 XXX-dir JobId 1154: Fatal error: Network error
> with FD during Verify: ERR=Connection reset by peer
> 2014-03-16 16:10:36 XXX-dir JobId 1154: Fatal error: No Job status
> returned from FD.
>
> or like this (for verify jobs):
> 2014-03-16 16:27:14 XXX-sd JobId 1155: Ready to read from volume
> "XXX_pool_0248" on device "FileStorage5" (/mnt/backups).
> 2014-03-17 03:10:31 XXX-dir JobId 1155: Fatal error: verify.c:758 bdird
> 2014-03-17 03:10:31 XXX-dir JobId 1155: Fatal error: Network error
> with FD during Verify: ERR=Connection timed out
>
> The backups for this client are quite big (~70Gb which are split into
> 2 volumes) and transfer rate is like 3-4Mb/s and full backup job takes
> like 6-7 hours to complete.
>
> Sometimes both jobs complete ok but quite often we meet errors like
> the above which I think are caused by some kind of network outages.
> Heartbeat intervals are set to 60 on all of Dir, SD and FD.
Bacula expects the TCP connection from Dir to FD to remain up during the
entire job. Even with the heartbeat, it is possible that some router
between the two is dropping the connection or there is an intermittent
disconnect somewhere along the route.
>
> Is there a way to deal with such kind of problems?
>
Use OpenVPN to create a VPN tunnel to the client. Bacula will only see
the virtual TUN/TAP interfaces created by OpenVPN and they will stay up
even when the physical interface is going up and down. OpenVPN will
connect and disconnect the internet connection over the physical
interface as needed when it detects packets to or from the virtual
interface.
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
|
|
|