Bacula-users

[Bacula-users] Issue with Network error on channel and speed...

2008-05-17 21:45:25
Subject: [Bacula-users] Issue with Network error on channel and speed...
From: Javier Gomez <gomez AT dynamicquest DOT com>
To: bacula-users AT lists.sourceforge DOT net
Date: Sat, 17 May 2008 21:19:26 -0400
    We have a bacula 2.2.8 server environment running on a Fedora 8 
server.  It has 2 core processors.  2 gigs of memory.  We are currently 
backing up about 200 servers ranging in size from 10 gigs up to about 
600 gigs of used space.  We do not use tapes at our facility.  All 
backups are performed to File devices.  In general Bacula has proven to 
be much more stable then any other solution we have used (thank you).  
We allow about 35 to 45 backups to run concurrently each night.  We have 
an offsite backup location running the Bacula environment with a 150 Meg 
point to point connection to our main facility where all of the 
production servers are located.  We have tested our lines and we do not 
seem to be maxing out the 150 meg fiber connection.  The connection 
seems fairly stable (losing a single ping packet every once and a while, 
otherwise its within a average of 5 ms from point to point.  We use a 
Cisco ASA 5520 and a few Cisco switches between the two points for 
communication (all new equipment).  My issue is that we have what seems 
like very slow backups (averaging 200 K bytes/second to the max of 
around 2.5 M/second), but in general all of the servers are sitting 
around the 500 K bytes/second.  I seem to get this same speed if I am 
running 40 backups concurrently or just one, so the speed does not seem 
to be based on the volume across the WAN connection.

    Then to make matters worst we seem to get a number of the following 
network errors noted below each night.  We have network monitoring 
software watching the data lines and the Cisco equipment on both ends 
and we don't see any network issues (none that are obvious).  We have 
had many situations were a number of backups will fail with this same 
error within the same 3 seconds which would make me think there was a 
network connection issue to the backup server,  But at the same time 
that those backups failed, another 15 were still actively running and 
completed just fine.  That made me think it was something with the 
Bacula SD locking it from time to time, but I have not seen any 
references to any issues.  The failed backup will work if we rerun the 
backup so its not a basic configuration issue.  I have set up the 
Heartbeat in the SD and the FD configurations to 300 (That helped to 
deal with the 2 hour timeout issues with most routers), but nothing 
seems to clean up the nightly errors we get like the one below.

------------------------
17-May 14:56 bacula001 JobId 8883: Fatal error: Network error with FD 
during Backup: ERR=Connection reset by peer
17-May 14:56 bacula001 JobId 8883: Job ServerA 7.2008-05-16_21.05.34 
marked to be canceled.
17-May 14:56 bacula001 JobId 8883: Fatal error: append.c:259 Network 
error on data channel. ERR=Connection reset by peer
17-May 14:56 bacula001 JobId 8883: Job write elapsed time = 17:45:38, 
Transfer rate = 374.9 K bytes/second
17-May 14:56 bacula001 JobId 8883: Fatal error: No Job status returned 
from FD.
------------------------

    Does anyone have any ideas on what I can do to help prevent these 
types of network errors as well as improve speed?  Or is there any more 
debugging type settings that I can set which would help me to track 
these issues down.
             Thanks for any type of help that can be given...
                      Javier


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>