[Bacula-users] Issue with Network error on channel and speed...
2008-05-17 21:45:25
We have a bacula 2.2.8 server environment running on a Fedora 8
server. It has 2 core processors. 2 gigs of memory. We are currently
backing up about 200 servers ranging in size from 10 gigs up to about
600 gigs of used space. We do not use tapes at our facility. All
backups are performed to File devices. In general Bacula has proven to
be much more stable then any other solution we have used (thank you).
We allow about 35 to 45 backups to run concurrently each night. We have
an offsite backup location running the Bacula environment with a 150 Meg
point to point connection to our main facility where all of the
production servers are located. We have tested our lines and we do not
seem to be maxing out the 150 meg fiber connection. The connection
seems fairly stable (losing a single ping packet every once and a while,
otherwise its within a average of 5 ms from point to point. We use a
Cisco ASA 5520 and a few Cisco switches between the two points for
communication (all new equipment). My issue is that we have what seems
like very slow backups (averaging 200 K bytes/second to the max of
around 2.5 M/second), but in general all of the servers are sitting
around the 500 K bytes/second. I seem to get this same speed if I am
running 40 backups concurrently or just one, so the speed does not seem
to be based on the volume across the WAN connection.
Then to make matters worst we seem to get a number of the following
network errors noted below each night. We have network monitoring
software watching the data lines and the Cisco equipment on both ends
and we don't see any network issues (none that are obvious). We have
had many situations were a number of backups will fail with this same
error within the same 3 seconds which would make me think there was a
network connection issue to the backup server, But at the same time
that those backups failed, another 15 were still actively running and
completed just fine. That made me think it was something with the
Bacula SD locking it from time to time, but I have not seen any
references to any issues. The failed backup will work if we rerun the
backup so its not a basic configuration issue. I have set up the
Heartbeat in the SD and the FD configurations to 300 (That helped to
deal with the 2 hour timeout issues with most routers), but nothing
seems to clean up the nightly errors we get like the one below.
------------------------
17-May 14:56 bacula001 JobId 8883: Fatal error: Network error with FD
during Backup: ERR=Connection reset by peer
17-May 14:56 bacula001 JobId 8883: Job ServerA 7.2008-05-16_21.05.34
marked to be canceled.
17-May 14:56 bacula001 JobId 8883: Fatal error: append.c:259 Network
error on data channel. ERR=Connection reset by peer
17-May 14:56 bacula001 JobId 8883: Job write elapsed time = 17:45:38,
Transfer rate = 374.9 K bytes/second
17-May 14:56 bacula001 JobId 8883: Fatal error: No Job status returned
from FD.
------------------------
Does anyone have any ideas on what I can do to help prevent these
types of network errors as well as improve speed? Or is there any more
debugging type settings that I can set which would help me to track
these issues down.
Thanks for any type of help that can be given...
Javier
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
|
<Prev in Thread] |
Current Thread |
[Next in Thread>
|
- [Bacula-users] Issue with Network error on channel and speed...,
Javier Gomez <=
|
|
|