Bacula-users

Re: [Bacula-users] Win32 FD / Write error sending N bytes to Storage daemon

2011-06-15 05:46:57
Subject: Re: [Bacula-users] Win32 FD / Write error sending N bytes to Storage daemon
From: Yann Cézard <yann.cezard AT univ-pau DOT fr>
Date: Wed, 15 Jun 2011 11:43:46 +0200
Le 13/06/2011 14:32, Josh Fisher a écrit :
On 6/13/2011 2:15 AM, Mike Seda wrote:
I forgot to mention that during my debugging, I did have "Heartbeat Interval" set to 10 on the Client, Storage, and Director resources. The same error still occurred... Very odd.


I have encountered similar situations with clients. Everything but Bacula would appear to work over the network, but Bacula would fail. In one case it was a bad switch, and 2 or 3 other times it was a bad NIC in the client. My conclusion is that Bacula is very sensitive to network problems, and since it is network heavy during a backup, it tends to reveal network problems when nothing else does. If the client has been working in the past, then suddenly began failing jobs, then the problem is not likely the config. The procedure I now go through to diagnose client problems is something like:

1) If a win32 client, then disable OS power management (can turn off NIC's PHY inappropriately)
2) Swap connections with an existing, known working client (if possible)
3) Replace Ethernet patch cable
4) Connect client to a different switch (if possible)
5) Replace client's NIC
6) Try different plenum cabling or bypass plenum cabling if possible
7) Physically move client and directly connect to the switch SD is connected to

For me, this error has always thus far ended up being a hardware problem.

I totally second that.
This is exactly what we are observing here, even it the clues were saying something else :
- Bacula is the only application that have the problem
- More precisely, Windows clients are the only ones to have problems.
=> But the real problem is network !

After some more tests (the day after my last tests, the network team
told me they had rebooted one of the network device, which made the
problem disappear for one day or two...), I can now say that the problem is on
the network side of our infrastructure, with no doubt !
Having a DIR/SD in a VM running on a side or the other of the problematic device
make the problem appears/disappears, so it is obvious now the problem is on
our network path, not in bacula.

My 2 cents.
-- 
Yann Cézard  -  infrastructures - administrateur systèmes serveurs
Centre de ressources informatiques    -     http://cri.univ-pau.fr
Université de Pau et des pays de l'Adour -  http://www.univ-pau.fr
bâtiment d'Alembert (anciennement IFR), rue Jules Ferry, 64000 Pau
Téléphone : +33 (0)5 59 40 77 94
------------------------------------------------------------------------------
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users