BackupPC-users

[BackupPC-users] Child Exited Prematurely

2008-11-18 14:14:16
Subject: [BackupPC-users] Child Exited Prematurely
From: James Sefton <james AT phase5.co DOT uk>
To: "backuppc-users AT lists.sourceforge DOT net" <backuppc-users AT lists.sourceforge DOT net>
Date: Tue, 18 Nov 2008 19:02:09 +0000

Hi,

 

Please excuse me if I am using this wrong, in all my years in IT, it seems this is the first time I have used a mailing list for support.  (I’m usually pretty good at the whole RTFM thing)

 

We have a backup box (FC6) that is running backups from a lot of windows servers using rSync.

We are not running the latest version of BackupPC.  I am reluctant to update this unless I have a good idea that it’s going to help since we have a lot of automated scripts that manage the BackupPC config files and we will need to review them all.  If this is the route we need to go then no problem but as I understand it, my problem is specifically related to rSync.  (please correct me if I am wrong.)

 

We have been running this for well over a year now (maybe a few years, my memory fails me) and iirc this problem has only started showing up over the past 6-12 months.  We do not have any automatic updates on the BackupPC box so nothing there should have really changed.

 

Out backup box resides on what we call our CORE network.  The servers it backs up are all on remote network which are connected to the CORE network with VPN’s.  (VPN’s are running over DSL)  Backups do take a long time to run (~10 hours or so) due to the amount of data.

 

The problem we are seeing is that Backups are randomly failing.

The log file on BackupPC showing something like this:

 

Connected to xxx.xxx.xxx.xxx:873, remote version 29

Negotiated protocol version 26

Connected to module kale-susl

Sending args: --server --sender --numeric-ids --perms --owner --group -D --links --times --block-size=2048 --recursive --ignore-times . .

Xfer PIDs are now 5220

[ skipped 971 lines ]

Read EOF: Connection reset by peer

Tried again: got 0 bytes

finish: removing in-process file Data/Apps/GoldMine/GMBase/ScriptsW.MDX

Child is aborting

Parent read EOF from child: fatal error!

Done: 923 files, 3041217353 bytes

Got fatal error during xfer (Child exited prematurely)

Backup aborted (Child exited prematurely)

 

The log on the windows server is:

 

2008/11/18 17:46:05 [3252] connect from UNKNOWN (xxx.xxx.xxx.xxx)
2008/11/18 17:46:05 [3252] rsync on . from backupuser@UNKNOWN (xxx.xxx.xxx.xxx)
2008/11/18 17:46:05 [3252] building file list
2008/11/18 18:03:14 [3252] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Connection reset by peer (104)
2008/11/18 18:03:14 [3252] rsync error: error in rsync protocol data stream (code 12) at /home/lapo/packaging/tmp/rsync-2.6.9/io.c(1122) [sender=2.6.9]

 

 

I have been trying to work this out for month or two now.

The problems seem to be random, but more common on specific servers.

There is nothing special about these specific servers – they seem just random but persistant.

 

Originally, we were running the recommended rSync package for BackupPC.

 

After looking into the problem over the past month, I have seen a lot of posts suggesting there this was a common problem with a particular build of rSync.

 

I have updated rSync on the backupPC box, “rpm –q rsync” currently replies...

 

rsync-2.6.9-5.fc8     (yes, the only updated rpm i could find was an fc8 one)

 

On a few select servers (including the one that generated the above logs) I setup cygwin directly and added rSync to it with the installer wizard.

I selected rSync 2.6.9 rather than 3.x.x as i assumed this would b required for compatibility.

 

These seem to be the only recommendations I can find for fixing this problem. (updating rSync)

Sadly, it has not helped me so far.

 

The connection between the BackupPC server and the example server used for the above logs is VPN like the rest of the servers but this server is local and the VPN operates over local Ethernet links. (ie. Stable links.)

 

I have tried and tried to verify as much as I can that there are no network/VPN dropouts at the times that this is failing and im pretty sure there are not.  It sometimes fails within 3 minutes of the job starting, other times after hours.  I know I have had a remote desktop session open to the server and been actively using it at the time it failed and I noticed absolutely no disturbance in my RD session.  (which you would expect at least a short pause if there was a brief disruption to the connection.)

 

I am at a loss and I am really hoping that someone will be able to show me a way to further my research into what is causing this problem so that I can hopefully isolate the issue and resolve it.

 

I have said above that this seems to be most prominent on specific servers (about 7 of them) but it IS happening occasionally on all of our servers.

We do get drops on our VPN’s from time to time but we have a monitoring system in place that alerts us immediately about this.

These drops probably average at about 1 per every 3 months, per VPN.  The rSync fails are daily, and usually several per day.

 

Any help would be very much appreciated.

 

Kind Regards,


James Sefton

Phase 5 Communications Ltd. (UK)

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/
<Prev in Thread] Current Thread [Next in Thread>