Bacula-users

[Bacula-users] broken pipe & connection reset by peer problems

2009-03-03 09:31:00
Subject: [Bacula-users] broken pipe & connection reset by peer problems
From: Wolfgang Jaede <w_jaede AT vcc DOT de>
To: bacula-users AT lists.sourceforge DOT net
Date: Mon, 16 Feb 2009 18:20:09 +0100
hello

ich have build an new bacula-server to backup various
linux-server
so ...
29 server in 5 subnets got to be backed up
on 26 everything works fine

the other 3 have a strange problem
(there are many servers with nearly exact the same hardware & software - 
but most of them make no problems
so, the can not be within the setup - i think ;-) )
the fd hangs on one file
(its not always the same file - but often)
and then ... after some time the job gets canceld

sometimes with 'connection reset by peer'
sometimes with 'broken pipe'

i have tried to play with the heartbeat directive
i have tried to downgrade the clients
as well as the dir and the sd
but nothing helps

i've digging in the mailing list and found
lots of entries with the same issue
but no solution

the fd-configuration
are all generated by a shell-script
so the only differ is the Name
and tey are working on 26 clients very well

here ist one example:

FileDaemon {                          # this is me
  Name = "mds-srv1.tec.vcc.de"
  FDport = 9102                  # where we listen for the director
  WorkingDirectory = /etc/bacula/working
  Pid Directory = /var/run
  Maximum Concurrent Jobs = 20
}

Director {
  Name = bacula.tec.vcc.de
  Password = "xxxxxxxx"
}

Director {
  Name = bacula.tec.vcc.de-mon
  Password = "xxxxxxxx"
  Monitor = yes
}

Messages {
  Name = Standard
  director = bacula.tec.vcc.de = all, !skipped, !restored
}

here is some debug-output of the fd:
mds-srv1.tec.vcc.de: backup.c:147 FT_REG saving: 
/usr/lib/locale/ko_KR.utf8/LC_COLLATE
mds-srv1.tec.vcc.de: backup.c:225 bfiled: sending 
/usr/lib/locale/ko_KR.utf8/LC_COLLATE to stored
mds-srv1.tec.vcc.de: backup.c:506 Send data to SD len=65536
mds-srv1.tec.vcc.de: backup.c:506 Send data to SD len=65536
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:77 Got BNET_SIG 0 from SD
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=1 stop=1
mds-srv1.tec.vcc.de: backup.c:111 end blast_data ok=0
mds-srv1.tec.vcc.de: job.c:1266 Error in blast_data.
mds-srv1.tec.vcc.de: job.c:1334 End FD msg: 2800 End Job TermCode=102 
JobFiles=100347 ReadBytes=4068720695 JobBytes=4069737071 Errors=0

mds-srv1.tec.vcc.de: job.c:208 Quit command loop. Canceled=1
mds-srv1.tec.vcc.de: job.c:289 Calling term_find_files
mds-srv1.tec.vcc.de: job.c:292 Done with term_find_files
mds-srv1.tec.vcc.de: mem_pool.c:363 garbage collect memory pool
mds-srv1.tec.vcc.de: job.c:294 Done with free_jcr


here is one job-email:
16-Feb 18:08 mds-srv1.tec.vcc.de: 
Linux-SystemFS_mds-srv1.tec.vcc.de.2009-02-16_17.30.06 Fatal error: 
backup.c:500 Network send error to SD. ERR=Die Wartezeit für die 
Verbindung ist abgelaufen (connection timeout)
16-Feb 18:12 bacula.tec.vcc.de JobId 116: Job 
Linux-SystemFS_mds-srv1.tec.vcc.de.2009-02-16_17.30.06 marked to be 
canceled.
16-Feb 18:12 bacula.tec.vcc.de JobId 116: Fatal error: append.c:259 
Network error on data channel. ERR=Die Verbindung wurde vom 
Kommunikationspartner zurückgesetzt (connection reset by peer)
16-Feb 18:12 bacula.tec.vcc.de JobId 116: Job write elapsed time = 
00:41:57, Transfer rate = 1.599 M bytes/second
16-Feb 18:12 bacula.tec.vcc.de JobId 116: Fatal error: append.c:304 
Fatal append error on device "FileStorage" (/Backup2Disk): ERR=
16-Feb 18:12 bacula.tec.vcc.de JobId 116: Error: bsock.c:444 Read error 
from client:192.168.100.51:36643: ERR=Die Verbindung wurde vom 
Kommunikationspartner zurückgesetzt (connection reset by peer)
16-Feb 18:12 bacula.tec.vcc.de JobId 116: Error: Bacula 
bacula.tec.vcc.de 2.2.8 (26Jan08): 16-Feb-2009 18:12:24
  Build OS:               i686-pc-linux-gnu redhat
  JobId:                  116
  Job:                    
Linux-SystemFS_mds-srv1.tec.vcc.de.2009-02-16_17.30.06
  Backup Level:           Full (upgraded from Incremental)
  Client:                 "mds-srv1.tec.vcc.de" 
x86_64-unknown-linux-gnu,suse,10
  FileSet:                "Linux-SystemFS" 2009-02-15 11:40:14
  Pool:                   "ServerBackup" (From Job resource)
  Storage:                "File" (From Job resource)
  Scheduled time:         16-Feb-2009 17:30:22
  Start time:             16-Feb-2009 17:30:26
  End time:               16-Feb-2009 18:12:24
  Elapsed time:           41 mins 58 secs
  Priority:               10
  FD Files Written:       100,347
  SD Files Written:       100,334
  FD Bytes Written:       4,069,737,071 (4.069 GB)
  SD Bytes Written:       4,024,995,720 (4.024 GB)
  Rate:                   1616.3 KB/s
  Software Compression:   None
  VSS:                    no
  Storage Encryption:     no
  Volume name(s):         B2D-File-0017
  Volume Session Id:      3
  Volume Session Time:    1234801299
  Last Volume Bytes:      21,671,875,000 (21.67 GB)
  Non-fatal FD errors:    0
  SD Errors:              0
  FD termination status:  Error
  SD termination status:  Canceled
  Termination:            *** Backup Error ***

beause of the german localization of this server
i translated the error-messages


any hints?
or wich information is needed to debug better?

-- 

Wolfgang Jaede
Doormannsweg 43
20259 Hamburg



------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>
  • [Bacula-users] broken pipe & connection reset by peer problems, Wolfgang Jaede <=