hello
ich have build an new bacula-server to backup various
linux-server
so ...
29 server in 5 subnets got to be backed up
on 26 everything works fine
the other 3 have a strange problem
(there are many servers with nearly exact the same hardware & software -
but most of them make no problems
so, the can not be within the setup - i think ;-) )
the fd hangs on one file
(its not always the same file - but often)
and then ... after some time the job gets canceld
sometimes with 'connection reset by peer'
sometimes with 'broken pipe'
i have tried to play with the heartbeat directive
i have tried to downgrade the clients
as well as the dir and the sd
but nothing helps
i've digging in the mailing list and found
lots of entries with the same issue
but no solution
the fd-configuration
are all generated by a shell-script
so the only differ is the Name
and tey are working on 26 clients very well
here ist one example:
FileDaemon { # this is me
Name = "mds-srv1.tec.vcc.de"
FDport = 9102 # where we listen for the director
WorkingDirectory = /etc/bacula/working
Pid Directory = /var/run
Maximum Concurrent Jobs = 20
}
Director {
Name = bacula.tec.vcc.de
Password = "xxxxxxxx"
}
Director {
Name = bacula.tec.vcc.de-mon
Password = "xxxxxxxx"
Monitor = yes
}
Messages {
Name = Standard
director = bacula.tec.vcc.de = all, !skipped, !restored
}
here is some debug-output of the fd:
mds-srv1.tec.vcc.de: backup.c:147 FT_REG saving:
/usr/lib/locale/ko_KR.utf8/LC_COLLATE
mds-srv1.tec.vcc.de: backup.c:225 bfiled: sending
/usr/lib/locale/ko_KR.utf8/LC_COLLATE to stored
mds-srv1.tec.vcc.de: backup.c:506 Send data to SD len=65536
mds-srv1.tec.vcc.de: backup.c:506 Send data to SD len=65536
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=0 stop=0
mds-srv1.tec.vcc.de: heartbeat.c:77 Got BNET_SIG 0 from SD
mds-srv1.tec.vcc.de: heartbeat.c:82 wait_intr=1 stop=1
mds-srv1.tec.vcc.de: backup.c:111 end blast_data ok=0
mds-srv1.tec.vcc.de: job.c:1266 Error in blast_data.
mds-srv1.tec.vcc.de: job.c:1334 End FD msg: 2800 End Job TermCode=102
JobFiles=100347 ReadBytes=4068720695 JobBytes=4069737071 Errors=0
mds-srv1.tec.vcc.de: job.c:208 Quit command loop. Canceled=1
mds-srv1.tec.vcc.de: job.c:289 Calling term_find_files
mds-srv1.tec.vcc.de: job.c:292 Done with term_find_files
mds-srv1.tec.vcc.de: mem_pool.c:363 garbage collect memory pool
mds-srv1.tec.vcc.de: job.c:294 Done with free_jcr
here is one job-email:
16-Feb 18:08 mds-srv1.tec.vcc.de:
Linux-SystemFS_mds-srv1.tec.vcc.de.2009-02-16_17.30.06 Fatal error:
backup.c:500 Network send error to SD. ERR=Die Wartezeit für die
Verbindung ist abgelaufen (connection timeout)
16-Feb 18:12 bacula.tec.vcc.de JobId 116: Job
Linux-SystemFS_mds-srv1.tec.vcc.de.2009-02-16_17.30.06 marked to be
canceled.
16-Feb 18:12 bacula.tec.vcc.de JobId 116: Fatal error: append.c:259
Network error on data channel. ERR=Die Verbindung wurde vom
Kommunikationspartner zurückgesetzt (connection reset by peer)
16-Feb 18:12 bacula.tec.vcc.de JobId 116: Job write elapsed time =
00:41:57, Transfer rate = 1.599 M bytes/second
16-Feb 18:12 bacula.tec.vcc.de JobId 116: Fatal error: append.c:304
Fatal append error on device "FileStorage" (/Backup2Disk): ERR=
16-Feb 18:12 bacula.tec.vcc.de JobId 116: Error: bsock.c:444 Read error
from client:192.168.100.51:36643: ERR=Die Verbindung wurde vom
Kommunikationspartner zurückgesetzt (connection reset by peer)
16-Feb 18:12 bacula.tec.vcc.de JobId 116: Error: Bacula
bacula.tec.vcc.de 2.2.8 (26Jan08): 16-Feb-2009 18:12:24
Build OS: i686-pc-linux-gnu redhat
JobId: 116
Job:
Linux-SystemFS_mds-srv1.tec.vcc.de.2009-02-16_17.30.06
Backup Level: Full (upgraded from Incremental)
Client: "mds-srv1.tec.vcc.de"
x86_64-unknown-linux-gnu,suse,10
FileSet: "Linux-SystemFS" 2009-02-15 11:40:14
Pool: "ServerBackup" (From Job resource)
Storage: "File" (From Job resource)
Scheduled time: 16-Feb-2009 17:30:22
Start time: 16-Feb-2009 17:30:26
End time: 16-Feb-2009 18:12:24
Elapsed time: 41 mins 58 secs
Priority: 10
FD Files Written: 100,347
SD Files Written: 100,334
FD Bytes Written: 4,069,737,071 (4.069 GB)
SD Bytes Written: 4,024,995,720 (4.024 GB)
Rate: 1616.3 KB/s
Software Compression: None
VSS: no
Storage Encryption: no
Volume name(s): B2D-File-0017
Volume Session Id: 3
Volume Session Time: 1234801299
Last Volume Bytes: 21,671,875,000 (21.67 GB)
Non-fatal FD errors: 0
SD Errors: 0
FD termination status: Error
SD termination status: Canceled
Termination: *** Backup Error ***
beause of the german localization of this server
i translated the error-messages
any hints?
or wich information is needed to debug better?
--
Wolfgang Jaede
Doormannsweg 43
20259 Hamburg
------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
|