[Bacula-users] 5.0.2 on Debian Lenny: DIR, FD, SD all on the same host, TCP keepalive enabled, nevertheless "Connection reset by peer" error
2010-05-11 11:34:21
Hi,
I'm running Bacula 5.0.2 in conjunction with PostgreSQL 8.4.3 (both
compiled from the "official" sources) on a Debian Lenny system. The
system runs the director, storage daemon and file daemon. The actual
storage media are barcode labeled LTO3 tapes contained in a HP Storage
Works 1/8 G2 autoloader equipped with an HP Ultrium 920 drive.
I've already followed the instructions in
http://tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/
and the corresponding FAQ entry
http://wiki.bacula.org/doku.php?id=faq#my_backup_starts_but_dies_after_a_while_with_connection_reset_by_peer_error
since I came accross this thread, which had been started before I subscribed to
the Bacula users mailing list:
http://adsm.org/lists/html/Bacula-users/2010-04/msg00172.html
I decided to compile the libkeeplive from Source Forge and use the
LD_PRELOAD mechanism in order to make sure libkeeplive.so is always
loaded. Furthermore I changed the sysctl and Heartbeat Interval
settings as described in the referenced FAQ entry.
Disabling accurate backups is not a solution for me since I want
deleted files to be taken into account and I have to run quite a few
long running shell scripts for gathering data via SCP from other hosts
before the actual backup (writing to tape via SD) starts. So, a
workaround for the TCP keepalive problem is absolutely necessary for me.
A status inquiry on the client from within bconsole works without a
problem, but even with TCP keepalive enabled my backup stops after a very short
period of time, as the
log from the Bacula director shows:
===
1-Mai 16:33 nathan-sd JobId 13: Job write elapsed time = 00:00:36,
Transfer rate = 20.30 M Bytes/second
11-Mai 16:34 nathan-fd JobId 13: Fatal error: backup.c:1019 Network
send error to SD. ERR=Connection reset by peer
11-Mai 16:34 nathan-dir JobId 13: Error: Bacula nathan-dir 5.0.2
(28Apr10): 11-Mai-2010 16:34:27
Build OS: x86_64-unknown-linux-gnu debian 5.0.4
JobId: 13
Job: nathan_backup.2010-05-11_16.32.43_03
Backup Level: Full (upgraded from Incremental)
Client: "nathan" 5.0.2 (28Apr10)
x86_64-unknown-linux-gnu,debian,5.0.4
FileSet: "nathan fileset" 2010-05-10 23:05:00
Pool: "WeeklyBackups" (From Job FullPool override)
Catalog: "MyCatalog" (From Client resource)
Storage: "nathan-sd" (From Pool resource)
Scheduled time: 11-Mai-2010 16:32:41
Start time: 11-Mai-2010 16:32:45
End time: 11-Mai-2010 16:34:27
Elapsed time: 1 min 42 secs
Priority: 10
FD Files Written: 4,072
SD Files Written: 4,059
FD Bytes Written: 734,948,632 (734.9 MB)
SD Bytes Written: 731,138,860 (731.1 MB)
Rate: 7205.4 KB/s
Software Compression: None
VSS: no
Encryption: no
Accurate: yes
Volume name(s):
Volume Session Id: 1
Volume Session Time: 1273588341
Last Volume Bytes: 2,194,827,264 (2.194 GB)
Non-fatal FD errors: 0
SD Errors: 0
FD termination status: Error
SD termination status: Error
Termination: *** Backup Error ***
===
Any help will be greatly appreciated.
Thanks in advance & kind regards,
Holger
signature.asc
Description: Digital signature
------------------------------------------------------------------------------
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
|
<Prev in Thread] |
Current Thread |
[Next in Thread>
|
- [Bacula-users] 5.0.2 on Debian Lenny: DIR, FD, SD all on the same host, TCP keepalive enabled, nevertheless "Connection reset by peer" error,
Holger Rauch <=
|
|
|