BackupPC-users

Re: [BackupPC-users] An idea to fix both SIGPIPE and memory issues with rsync

2009-12-15 09:12:57
Subject: Re: [BackupPC-users] An idea to fix both SIGPIPE and memory issues with rsync
From: Holger Parplies <wbppc AT parplies DOT de>
To: Robin Lee Powell <rlpowell AT digitalkingdom DOT org>
Date: Tue, 15 Dec 2009 14:33:06 +0100
Hi,

Robin Lee Powell wrote on 2009-12-15 00:22:41 -0800 [Re: [BackupPC-users] An 
idea to fix both SIGPIPE and memory issues with?rsync]:
> On Mon, Dec 14, 2009 at 08:20:07PM -0600, Les Mikesell wrote:
> > Robin Lee Powell wrote:
> > > 
> > > Asking rsync, and ssh, and a pair of firewalls and load
> > > balancers (it's complicated) to stay perfectly fine for almost a
> > > full day is really asking a whole hell of a lot.
> > 
> > I don't think that should be true.  There's no reason for a
> > program to quit just because it has been running for a day and no
> > particular limit to what ssh can transfer.   And tcp can deal with
> > quite a lot of lossage and problems - unless your load balancers
> > are NATing to different sources or tossing RST's when they fail
> > over.
> 
> Oh, I agree; in an ideal world, it wouldn't be an issue.  I'm afraid
> I don't live there.  :)

none of us do, but you're having problems. We aren't. The suggestion that your
*software* is probably misconfigured in addition to the *hardware* being
flakey makes a lot of sense to me. You can always delegate fixing problems to
the next higher logical layer. Sometimes that makes it easier, sometimes it
doesn't. I believe, in this case it doesn't.

> > > For large data sets like this, rsync simple isn't robust enough
> > > by itself.  Losing 15 hours worth of (BackupPC's) work because
> > > the ssh connection goes down is *really* frustrating.
> > 
> > I don't think it is rsync or ssh's problem, although you are
> > correct that rsync could be better about handling huge sets of
> > files.  Both should be as reliable as the underlying hardware.

Actually, they should be more reliable, since TCP will fix quite a lot of
things. Sure, if you unplug the cable, TCP can't fix that on its own. I use
ssh over ISDN links that (intentionally) go up and down all the time
(actually, I'm writing this message over such a connection - open since Oct
10th). Longer periods of failure to reestablish the link are a problem, but
it is pretty much configurable what "longer" means. Aside from that,
everything works much as expected.

> I don't want to talk about our underlying hardware.  ;'(

You don't have to talk about anything, but you have to *think* about your
potential problem sources if you want anyone to be able (much less willing) to
help you. There's probably no need to touch your hardware (I believe Les was
talking about *software* misconfiguration), but you should realize that working
around broken hardware might cost you a huge sum in maintenance in the long
run. My impression is that your system *design* is overly complicated, but
since you aren't talking about it (much), I'm just guessing.

If you want to complicate it further, you could think about adding a VPN layer
between BackupPC server and client machine. I would imagine that might handle
some types of problems better than ssh. Then again, it's not really clear to
me what kind of problems exactly you're facing ("it's complicated", right?).


Concerning your original idea, I believe the rsync --sender generates the file
list, so I don't see how you would want BackupPC (File::RsyncP - the receiver,
as far as backups are concerned) to provide it during the transfer, apart from
starting a distinct rsync process for each (set of) remaining element(s) from
the file list. Obviously, that would be slower and more bandwidth consuming by
orders of magnitude in many cases.

As far as the point of changed content in already visited directories is
concerned, your backups have a startDate and an endDate. Without FS snapshots,
you want to keep these two as close together as possible for your backups to
be meaningful (with snapshots, you need to do the same to prevent your
snapshot from running out of space). With your suggestion of resuming backups,
startDate would be that of the first attempt. Depending on your situation
(WakeupSchedule, availability of backup target machine and network connection),
startDate and endDate might drift apart considerably. You would need to define
up to which point this makes sense - at some point you would want to time out
the running backup and start a new one. Does that, in any way, remind you of
TCP?


I can understand the wish to save as much duplicated effort as possible - both
concerning transmission bandwidth and server and client load. But this approach
seems to be more meaningful as a kludge around a specific network problem than
a general purpose solution for people seeking reliable backups of their data.

Regards,
Holger

------------------------------------------------------------------------------
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

<Prev in Thread] Current Thread [Next in Thread>