Bacula-users

[Bacula-users] SLES9 x64 ONLY intermittent bsock error and fails job. 50% of the time the jobs work all other clients work fine

2011-01-04 07:11:10
Subject: [Bacula-users] SLES9 x64 ONLY intermittent bsock error and fails job. 50% of the time the jobs work all other clients work fine
From: Paul Hanson <paul.hanson AT espida.co DOT uk>
To: "bacula-users AT lists.sourceforge DOT net" <bacula-users AT lists.sourceforge DOT net>
Date: Tue, 4 Jan 2011 11:33:45 +0000

Hi,

 

I have an installation that was previously using version 2.4.4 and was upgraded to 5.0.3 with good success. However, there was a previous problem with the SLES9 x64 clients that would intermittently fail the jobs due to bsock errors. So the error has carried forward from 2.4.4 to 5.0.3 and has not fixed the fault by upgrading to the latest. The errors would occur with a little as a few MB through to a couple of 100MB. The jobs could be full or incremental using either tape or disk pool. More often the backup will fail with only 40-50 MB backed up - so the job starts but fails quite quickly but waits for network timeout values to exceed to report the failure and cancelling the job.

 

The client base is all version 5.0.3 as is the SD and DIR services. We have approximately 80 clients where 90 % are SLES10 (x64 and a couple of 32 bit versions) and they work without error, 8% are Windows 2003 and work without error but the two SLES9 x64 builds have BOTH exhibited this intermittent network timeout issue. If I run a manual backup (instead of scheduled) during a non-maintenance window then the same client backs up correctly without error (tape and disk pool).

 

It has ONLY been the two SLES9 x64 platforms that have errored this way, all other clients do NOT error at all. I have checked default TCP timeout values which are all 7200 seconds but my feeling is that this is NOT the fault. This maybe a threading issue or a concurrency issue specific to Bacula (either 2.4.4 or 5.0.3) with SLES9 x64 or this could even be a MTU issue but this doesn't explain why it works 50% of the time. The majority of the machines are virtual and as such share the physical hosts so networking shouldn't be an issue per-se.

 

I would appreciate any opinions or experience with this type of error as this is proving to be difficult to repeat manually.

 

Errors...

 

Error: bsock.c:393 Write error sending 65536 bytes to Storage daemon:mybaculaserver.mylocaldomain:9103: ERR=Broken pipe

 

Fatal error: backup.c:1024 Network send error to SD. ERR=Broken pipe

 

 



***Disclaimer****

This email and any attachments may contain confidential and/or privileged material; it is for the intended addressee(s) only. If you are not a named addressee, you must not use, retain or disclose such information.

The Waterdale Group (3477263), comprising of CPiO Limited (2488682), Ardent Solutions Limited (4807328), eSpida Limited (4021203), Advanced Digital Technology Limited (1750478) and Intellisell (6070355) whilst taking all reasonable precautions against email or attachment viruses, cannot guarantee the safety of this email.

The views expressed in this email are those of the originator and do not necessarily represent the views of The Waterdale Group

Nothing in this email shall bind The Waterdale Group in any contract or obligation. A copy of the Waterdale Group of Companies'Conditions of Sale can be downloaded at www.waterdalegroup.co.uk or on any individual group member site.

For further information visit www.waterdalegroup.co.uk

------------------------------------------------------------------------------
Learn how Oracle Real Application Clusters (RAC) One Node allows customers
to consolidate database storage, standardize their database environment, and, 
should the need arise, upgrade to a full multi-node Oracle RAC database 
without downtime or disruption
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
<Prev in Thread] Current Thread [Next in Thread>
  • [Bacula-users] SLES9 x64 ONLY intermittent bsock error and fails job. 50% of the time the jobs work all other clients work fine, Paul Hanson <=