Bacula-users

Re: [Bacula-users] Win32 FD / Write error sending N bytes to Storage daemon

2011-06-15 14:20:25
Subject: Re: [Bacula-users] Win32 FD / Write error sending N bytes to Storage daemon
From: Mike Seda <maseda AT stanford DOT edu>
To: bacula-users AT lists.sourceforge DOT net
Date: Wed, 15 Jun 2011 11:17:15 -0700
I just wanted to add that my similar problem was also related to network gear (hardware firewall). I resolved it by following the document below:
http://wiki.bacula.org/doku.php?id=faq#my_backup_starts_but_dies_after_a_while_with_connection_reset_by_peer_error

I did have Heartbeat Interval set several days back, but I did not complete (or know about) step 2 at the aforementioned link. I went ahead and added the change from step 2, and then added Heartbeat Interval back into my configs (FD, SD, DIR). It seems to have worked. :-)

BTW, the FD that was having issues (connection reset after 2 hours) is in front of the hardware firewall and the Bacula DIR is behind it. Plus, the SD lives in a totally different VLAN (behind *another* firewall), but will be moved to the same VLAN as the DIR in the next week or so.


On 06/15/2011 02:43 AM, Yann Cézard wrote:
Le 13/06/2011 14:32, Josh Fisher a écrit :
On 6/13/2011 2:15 AM, Mike Seda wrote:
I forgot to mention that during my debugging, I did have "Heartbeat Interval" set to 10 on the Client, Storage, and Director resources. The same error still occurred... Very odd.


I have encountered similar situations with clients. Everything but Bacula would appear to work over the network, but Bacula would fail. In one case it was a bad switch, and 2 or 3 other times it was a bad NIC in the client. My conclusion is that Bacula is very sensitive to network problems, and since it is network heavy during a backup, it tends to reveal network problems when nothing else does. If the client has been working in the past, then suddenly began failing jobs, then the problem is not likely the config. The procedure I now go through to diagnose client problems is something like:

1) If a win32 client, then disable OS power management (can turn off NIC's PHY inappropriately)
2) Swap connections with an existing, known working client (if possible)
3) Replace Ethernet patch cable
4) Connect client to a different switch (if possible)
5) Replace client's NIC
6) Try different plenum cabling or bypass plenum cabling if possible
7) Physically move client and directly connect to the switch SD is connected to

For me, this error has always thus far ended up being a hardware problem.

I totally second that.
This is exactly what we are observing here, even it the clues were saying something else :
- Bacula is the only application that have the problem
- More precisely, Windows clients are the only ones to have problems.
=> But the real problem is network !

After some more tests (the day after my last tests, the network team
told me they had rebooted one of the network device, which made the
problem disappear for one day or two...), I can now say that the problem is on
the network side of our infrastructure, with no doubt !
Having a DIR/SD in a VM running on a side or the other of the problematic device
make the problem appears/disappears, so it is obvious now the problem is on
our network path, not in bacula.

My 2 cents.
-- 
Yann Cézard  -  infrastructures - administrateur systèmes serveurs
Centre de ressources informatiques    -     http://cri.univ-pau.fr
Université de Pau et des pays de l'Adour -  http://www.univ-pau.fr
bâtiment d'Alembert (anciennement IFR), rue Jules Ferry, 64000 Pau
Téléphone : +33 (0)5 59 40 77 94
------------------------------------------------------------------------------ EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev
_______________________________________________ Bacula-users mailing list Bacula-users AT lists.sourceforge DOT net https://lists.sourceforge.net/lists/listinfo/bacula-users
------------------------------------------------------------------------------
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users