BackupPC-users

Re: [BackupPC-users] Child Exited Prematurely

2008-11-18 18:39:40
Subject: Re: [BackupPC-users] Child Exited Prematurely
From: Chris Robertson <crobertson AT gci DOT net>
To: "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Tue, 18 Nov 2008 14:37:55 -0900
James Sefton wrote:
>
> Hi,
>
> Please excuse me if I am using this wrong, in all my years in IT, it 
> seems this is the first time I have used a mailing list for support. 
> (I’m usually pretty good at the whole RTFM thing)
>
> We have a backup box (FC6) that is running backups from a lot of 
> windows servers using rSync.
>
> We are not running the latest version of BackupPC. I am reluctant to 
> update this unless I have a good idea that it’s going to help since we 
> have a lot of automated scripts that manage the BackupPC config files 
> and we will need to review them all. If this is the route we need to 
> go then no problem but as I understand it, my problem is specifically 
> related to rSync. (please correct me if I am wrong.)
>
> We have been running this for well over a year now (maybe a few years, 
> my memory fails me) and iirc this problem has only started showing up 
> over the past 6-12 months. We do not have any automatic updates on the 
> BackupPC box so nothing there should have really changed.
>
> Out backup box resides on what we call our CORE network. The servers 
> it backs up are all on remote network which are connected to the CORE 
> network with VPN’s. (VPN’s are running over DSL) Backups do take a 
> long time to run (~10 hours or so) due to the amount of data.
>
> The problem we are seeing is that Backups are randomly failing.
>
> The log file on BackupPC showing something like this:
>
> Connected to xxx.xxx.xxx.xxx:873, remote version 29
>
> Negotiated protocol version 26
>
> Connected to module kale-susl
>
> Sending args: --server --sender --numeric-ids --perms --owner --group 
> -D --links --times --block-size=2048 --recursive --ignore-times . .
>
> Xfer PIDs are now 5220
>
> [ skipped 971 lines ]
>
> Read EOF: Connection reset by peer
>

I saw a similar problem from time to time due to firewalls that close 
"inactive" connections. RSync can sit a while without passing data as 
file lists are created, and files are compared. See 
https://bugzilla.samba.org/show_bug.cgi?id=5695 for a related issue. 
Check your VPN setup for "inactive session timeout" or the like. SSH has 
the ServerAliveInterval option that can mitigate this.

> Tried again: got 0 bytes
>
> finish: removing in-process file Data/Apps/GoldMine/GMBase/ScriptsW.MDX
>
> Child is aborting
>
> Parent read EOF from child: fatal error!
>
> Done: 923 files, 3041217353 bytes
>
> Got fatal error during xfer (Child exited prematurely)
>
> Backup aborted (Child exited prematurely)
>
> The log on the windows server is:
>
> 2008/11/18 17:46:05 [3252] connect from UNKNOWN (xxx.xxx.xxx.xxx)
> 2008/11/18 17:46:05 [3252] rsync on . from backupuser@UNKNOWN 
> (xxx.xxx.xxx.xxx)
> 2008/11/18 17:46:05 [3252] building file list
> 2008/11/18 18:03:14 [3252] rsync: writefd_unbuffered failed to write 4092 
> bytes [sender]: Connection reset by peer (104)
> 2008/11/18 18:03:14 [3252] rsync error: error in rsync protocol data stream 
> (code 12) at /home/lapo/packaging/tmp/rsync-2.6.9/io.c(1122) [sender=2.6.9]
>
> I have been trying to work this out for month or two now.
>
> The problems seem to be random, but more common on specific servers.
>
> There is nothing special about these specific servers – they seem just 
> random but persistant.
>
> Originally, we were running the recommended rSync package for BackupPC.
>
> After looking into the problem over the past month, I have seen a lot 
> of posts suggesting there this was a common problem with a particular 
> build of rSync.
>
> I have updated rSync on the backupPC box, “rpm –q rsync” currently 
> replies...
>
> rsync-2.6.9-5.fc8 (yes, the only updated rpm i could find was an fc8 one)
>

That's the danger of using Fedora Core for a server. The support just 
doesn't last. SUSE, Ubuntu and CentOS are much better server OS choices.

You might try grabbing the SRPM from Fedora 9 
(http://download.fedora.redhat.com/pub/fedora/linux/updates/9/SRPMS.newkey/rsync-3.0.4-0.fc9.src.rpm),
 
install rpm-build and see if you can roll your own.

> On a few select servers (including the one that generated the above 
> logs) I setup cygwin directly and added rSync to it with the installer 
> wizard.
>
> I selected rSync 2.6.9 rather than 3.x.x as i assumed this would b 
> required for compatibility.
>

Nope. RSync is backwards compatible.

> These seem to be the only recommendations I can find for fixing this 
> problem. (updating rSync)
>
> Sadly, it has not helped me so far.
>
> The connection between the BackupPC server and the example server used 
> for the above logs is VPN like the rest of the servers but this server 
> is local and the VPN operates over local Ethernet links. (ie. Stable 
> links.)
>
> I have tried and tried to verify as much as I can that there are no 
> network/VPN dropouts at the times that this is failing and im pretty 
> sure there are not. It sometimes fails within 3 minutes of the job 
> starting, other times after hours. I know I have had a remote desktop 
> session open to the server and been actively using it at the time it 
> failed and I noticed absolutely no disturbance in my RD session. 
> (which you would expect at least a short pause if there was a brief 
> disruption to the connection.)
>
> I am at a loss and I am really hoping that someone will be able to 
> show me a way to further my research into what is causing this problem 
> so that I can hopefully isolate the issue and resolve it.
>
> I have said above that this seems to be most prominent on specific 
> servers (about 7 of them) but it IS happening occasionally on all of 
> our servers.
>
> We do get drops on our VPN’s from time to time but we have a 
> monitoring system in place that alerts us immediately about this.
>
> These drops probably average at about 1 per every 3 months, per VPN. 
> The rSync fails are daily, and usually several per day.
>
> Any help would be very much appreciated.
>
> Kind Regards,
>
>
> James Sefton
>
> Phase 5 Communications Ltd. (UK)
>

Chris


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

<Prev in Thread] Current Thread [Next in Thread>