Bacula-users

Re: [Bacula-users] 1 second Network glitch kills Windows backups

2016-07-02 05:21:25
Subject: Re: [Bacula-users] 1 second Network glitch kills Windows backups
From: Kern Sibbald <kern AT sibbald DOT com>
To: Craig Shiroma <shiroma.craig.2 AT gmail DOT com>
Date: Sat, 2 Jul 2016 11:20:10 +0200
Hello Craig,

I forgot to mention that I implemented a feature quite a long time ago that compensates for most poor quality (or defensive) switches that disconnect idle lines despite the standard Internet 2 hour delay.  The feature is a directive called:

   Heartbeat Interval = 300

(300 is a good value -- 5 mins).  However, you must set this in numerous places:  the Director Director, Client, and Storage resources, the File Daemons Client (or FileDaemon) resource, and the Storage daemon's Storage resource.  I.e. 3 places in the director, 1 in the FD and 1 in the SD.

If I am not mistaken, this may be the default in recent versions, but you need to check your .conf files.

Best regards,

Kern



On 07/01/2016 10:51 PM, Craig Shiroma wrote:
Thank you Wanderlei, Josh and Kern!  Judging from Kern's and Josh's replies, the solution is to try and find a fix on Windows and the possibly the switches.  I guess it's best just to live with the canceled jobs and re-run them.  I'd rather have good backups than incomplete ones since as Kern indicated if the connection drops Bacula has no idea what reached and did not reached the other side.

The reason for the drops appears to be related to our firewall.  For some reason, the secure tunnel goes down around the same time every day for a second or two.

On Fri, Jul 1, 2016 at 10:30 AM, Kern Sibbald <kern AT sibbald DOT com> wrote:
Hello,

In general, the TCP/IP protocol that Bacula uses is extremely tolerant,
and should retry sending packets quite a number of times before finally
giving up.  It is designed to tolerate a significant number of dropped
packets.  However, then there are two things that enter to screw this
up: 1. network switches which do not follow internet rules at all (i.e.
they are very fast to drop idle connections, even if you have explicitly
set the network to survive idle periods as Bacula does); 2. Windows
which does not seem to follow quite a few Internet rules.  The two put
together mean that especially with on Windows machines, network
disruptions are annoyingly frequent.

Bacula was designed with the concept that the Internet never loses
packets and that it is highly tolerant -- given the above two problems,
maybe this was a bad choice.  However, the result of that decision is
that if the line drops, Bacula has no idea what reached and what did not
reach the other side, and so it is not currently possible for it to
reconnect and resume where it left off.

Best regards,
Kern

On 07/01/2016 03:43 PM, Josh Fisher wrote:
> On 7/1/2016 1:26 AM, Craig Shiroma wrote:
>> Hello All,
>>
>> Is there a way in Bacula to prevent something like a 1 or 2 second
>> network glitch from cancelling Window Server backups? RHEL backups
>> seem to survive these episodes with no problems.
>>
> Bacula expects DIR-to-FD and FD-to-SD TCP connections to persist for the
> duration of a job. If either is dropped it will cause the job to cancel.
> As a result, Bacula is not very tolerant of network problems. Since it
> appears to be the Windows Server machines dropping the connection due to
> the glitch, if there is a solution it will be in the Windows networking
> config, possibly in the NIC's driver settings. Perhaps RHEL is more
> tolerant of the glitch or perhaps it is a hardware difference in the
> NICs. Nevertheless, in general such network glitches cause problems for
> Bacula regardless of OS.
>
>
>
>> Respectfully,
>> Craig
>>
>
> ------------------------------------------------------------------------------
> Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
> Francisco, CA to explore cutting-edge tech and listen to tech luminaries
> present their vision of the future. This family event has something for
> everyone, including kids. Get more information and register today.
> http://sdm.link/attshape
> _______________________________________________
> Bacula-users mailing list
> Bacula-users AT lists.sourceforge DOT net
> https://lists.sourceforge.net/lists/listinfo/bacula-users
>


------------------------------------------------------------------------------
Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users


------------------------------------------------------------------------------
Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users