Bacula-users

Re: [Bacula-users] jobs fail with various "broken pipe" errors

2012-02-22 08:22:10
Subject: Re: [Bacula-users] jobs fail with various "broken pipe" errors
From: Silver Salonen <silver AT serverock DOT ee>
To: Hugo Letemplier <hugo.let.35 AT gmail DOT com>
Date: Wed, 22 Feb 2012 15:20:10 +0200
On Wed, 22 Feb 2012 12:33:49 +0100, Hugo Letemplier wrote:
> I think you can try to configure the Heartbeat Interval directive on
> your various daemons.

Hi.

My SD already had Heartbeat Interval set to 60. I now tried it on one 
FD too, but the job still failed with the same error.

Other FD's on both FreeBSD and Linux are able to run jobs for hours and 
complete them successfully.

--
Silver

> 2012/2/22 Silver Salonen <silver AT serverock DOT ee>:
>> Hi.
>>
>>
>>
>> Recently we changed the network connection for our backup server 
>> which is
>> Bacula 5.2.3 on FreeBSD 9.0.
>>
>>
>>
>> After that many jobs running across WAN started failing with various 
>> "broken
>> pipe" errors. Some examples:
>>
>>
>>
>> 21-Feb 22:42 fbsd1-fd JobId 57779: Error: bsock.c:398 Wrote 32151 
>> bytes to
>> Storage daemon:backupsrv.url:9103, but only 16384 accepted.
>>
>> 21-Feb 22:42 fbsd1-fd JobId 57779: Fatal error: backup.c:1024 
>> Network send
>> error to SD. ERR=Broken pipe
>>
>> 21-Feb 22:42 fbsd1-fd JobId 57779: Error: bsock.c:339 Socket has 
>> errors=1 on
>> call to Storage daemon:backupsrv.url:9103
>>
>>
>>
>> (this one runs from the same backup-network to another SD)
>>
>> 22-Feb 00:14 backupsrv-dir JobId 57852: Fatal error: Network error 
>> with FD
>> during Backup: ERR=Broken pipe
>>
>> 22-Feb 00:14 backupsrv-sd2 JobId 57852: JobId=57852
>> Job="linux1-userdata.2012-02-21_23.05.02_02" marked to be canceled.
>>
>> 22-Feb 00:14 backupsrv-sd2 JobId 57852: Job write elapsed time = 
>> 00:33:20,
>> Transfer rate = 591.5 K Bytes/second
>>
>> 22-Feb 00:14 backupsrv-sd2 JobId 57852: Error: bsock.c:529 Read 
>> expected
>> 65568 got 1448 from client:123.45.67.81:36643
>>
>> 22-Feb 00:14 backupsrv-dir JobId 57852: Fatal error: No Job status 
>> returned
>> from FD.
>>
>>
>>
>> 22-Feb 00:16 backupsrv-dir JobId 57821: Fatal error: Network error 
>> with FD
>> during Backup: ERR=Broken pipe
>>
>> 22-Feb 00:16 backupsrv-sd JobId 57821: Job write elapsed time = 
>> 00:57:00,
>> Transfer rate = 26.69 K Bytes/second
>>
>> 22-Feb 00:16 backupsrv-dir JobId 57821: Fatal error: No Job status 
>> returned
>> from FD.
>>
>>
>>
>> 22-Feb 00:24 winsrv1-fd JobId 57784: Error:
>> /home/kern/bacula/k/bacula/src/lib/bsock.c:393 Write error sending 
>> 9363
>> bytes to Storage daemon:backupsrv.url:9103: ERR=Input/output error
>>
>> 22-Feb 00:24 winsrv1-fd JobId 57784: Fatal error:
>> /home/kern/bacula/k/bacula/src/filed/backup.c:1024 Network send 
>> error to SD.
>> ERR=Input/output error
>>
>> 22-Feb 00:26 winsrv1-fd JobId 57784: Error:
>> /home/kern/bacula/k/bacula/src/lib/bsock.c:339 Socket has errors=1 
>> on call
>> to Storage daemon:backupsrv.url:9103
>>
>>
>>
>> (this one runs from the same backup-network to another SD)
>>
>> 22-Feb 01:33 backupsrv-dir JobId 57872: Fatal error: Socket error on
>> ClientRunBeforeJob command: ERR=Broken pipe
>>
>> 22-Feb 01:33 backupsrv-dir JobId 57872: Fatal error: Client 
>> "winsrv2-fd"
>> RunScript failed.
>>
>> 22-Feb 01:33 backupsrv-dir JobId 57872: Fatal error: Network error 
>> with FD
>> during Backup: ERR=Broken pipe
>>
>> 22-Feb 01:33 backupsrv-sd2 JobId 57872: JobId=57872
>> Job="winsrv2.2012-02-22_01.00.00_27" marked to be canceled.
>>
>> 22-Feb 01:33 backupsrv-dir JobId 57872: Fatal error: No Job status 
>> returned
>> from FD.
>>
>>
>>
>> 22-Feb 01:51 fbsd2-fd JobId 57806: Error: bsock.c:398 Wrote 61750 
>> bytes to
>> Storage daemon:backupsrv.url:9103, but only 16384 accepted.
>>
>> 22-Feb 01:51 fbsd2-fd JobId 57806: Fatal error: backup.c:1024 
>> Network send
>> error to SD. ERR=Broken pipe
>>
>> 22-Feb 01:51 fbsd2-fd JobId 57806: Error: bsock.c:339 Socket has 
>> errors=1 on
>> call to Storage daemon:backupsrv.url:9103
>>
>>
>>
>> 22-Feb 02:15 backupsrv-dir JobId 57819: Fatal error: Network error 
>> with FD
>> during Backup: ERR=Connection reset by peer
>>
>> 22-Feb 02:15 backupsrv-dir JobId 57819: Fatal error: No Job status 
>> returned
>> from FD.
>>
>>
>>
>>
>>
>> These jobs have been failing every day for a week now. Meanwhile 
>> other jobs
>> complete just fine, and it seems not to about jobs' size or scripts 
>> to be
>> run before jobs on clients etc.
>>
>>
>>
>> Any idea what could be wrong?
>>
>>
>>
>> --
>>
>> Silver
>>
>>
>> 
>> ------------------------------------------------------------------------------
>> Virtualization & Cloud Management Using Capacity Planning
>> Cloud computing makes use of virtualization - but cloud computing
>> also focuses on allowing computing to be delivered as a service.
>> http://www.accelacomm.com/jaw/sfnl/114/51521223/
>> _______________________________________________
>> Bacula-users mailing list
>> Bacula-users AT lists.sourceforge DOT net
>> https://lists.sourceforge.net/lists/listinfo/bacula-users
>>


------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users