Hi.
Recently we changed the network connection for our backup server which is Bacula 5.2.3 on FreeBSD 9.0.
After that many jobs running across WAN started failing with various "broken pipe" errors. Some examples:
21-Feb 22:42 fbsd1-fd JobId 57779: Error: bsock.c:398 Wrote 32151 bytes to Storage daemon:backupsrv.url:9103, but only 16384 accepted.
21-Feb 22:42 fbsd1-fd JobId 57779: Fatal error: backup.c:1024 Network send error to SD. ERR=Broken pipe
21-Feb 22:42 fbsd1-fd JobId 57779: Error: bsock.c:339 Socket has errors=1 on call to Storage daemon:backupsrv.url:9103
(this one runs from the same backup-network to another SD)
22-Feb 00:14 backupsrv-dir JobId 57852: Fatal error: Network error with FD during Backup: ERR=Broken pipe
22-Feb 00:14 backupsrv-sd2 JobId 57852: JobId=57852 Job="linux1-userdata.2012-02-21_23.05.02_02" marked to be canceled.
22-Feb 00:14 backupsrv-sd2 JobId 57852: Job write elapsed time = 00:33:20, Transfer rate = 591.5 K Bytes/second
22-Feb 00:14 backupsrv-sd2 JobId 57852: Error: bsock.c:529 Read expected 65568 got 1448 from client:123.45.67.81:36643
22-Feb 00:14 backupsrv-dir JobId 57852: Fatal error: No Job status returned from FD.
22-Feb 00:16 backupsrv-dir JobId 57821: Fatal error: Network error with FD during Backup: ERR=Broken pipe
22-Feb 00:16 backupsrv-sd JobId 57821: Job write elapsed time = 00:57:00, Transfer rate = 26.69 K Bytes/second
22-Feb 00:16 backupsrv-dir JobId 57821: Fatal error: No Job status returned from FD.
22-Feb 00:24 winsrv1-fd JobId 57784: Error: /home/kern/bacula/k/bacula/src/lib/bsock.c:393 Write error sending 9363 bytes to Storage daemon:backupsrv.url:9103: ERR=Input/output error
22-Feb 00:24 winsrv1-fd JobId 57784: Fatal error: /home/kern/bacula/k/bacula/src/filed/backup.c:1024 Network send error to SD. ERR=Input/output error
22-Feb 00:26 winsrv1-fd JobId 57784: Error: /home/kern/bacula/k/bacula/src/lib/bsock.c:339 Socket has errors=1 on call to Storage daemon:backupsrv.url:9103
(this one runs from the same backup-network to another SD)
22-Feb 01:33 backupsrv-dir JobId 57872: Fatal error: Socket error on ClientRunBeforeJob command: ERR=Broken pipe
22-Feb 01:33 backupsrv-dir JobId 57872: Fatal error: Client "winsrv2-fd" RunScript failed.
22-Feb 01:33 backupsrv-dir JobId 57872: Fatal error: Network error with FD during Backup: ERR=Broken pipe
22-Feb 01:33 backupsrv-sd2 JobId 57872: JobId=57872 Job="winsrv2.2012-02-22_01.00.00_27" marked to be canceled.
22-Feb 01:33 backupsrv-dir JobId 57872: Fatal error: No Job status returned from FD.
22-Feb 01:51 fbsd2-fd JobId 57806: Error: bsock.c:398 Wrote 61750 bytes to Storage daemon:backupsrv.url:9103, but only 16384 accepted.
22-Feb 01:51 fbsd2-fd JobId 57806: Fatal error: backup.c:1024 Network send error to SD. ERR=Broken pipe
22-Feb 01:51 fbsd2-fd JobId 57806: Error: bsock.c:339 Socket has errors=1 on call to Storage daemon:backupsrv.url:9103
22-Feb 02:15 backupsrv-dir JobId 57819: Fatal error: Network error with FD during Backup: ERR=Connection reset by peer
22-Feb 02:15 backupsrv-dir JobId 57819: Fatal error: No Job status returned from FD.
These jobs have been failing every day for a week now. Meanwhile other jobs complete just fine, and it seems not to about jobs' size or scripts to be run before jobs on clients etc.
Any idea what could be wrong?
--
Silver |