BackupPC-users

Re: [BackupPC-users] An idea to fix both SIGPIPE and memory issues with rsync

2009-12-14 00:01:14
Subject: Re: [BackupPC-users] An idea to fix both SIGPIPE and memory issues with rsync
From: "Jeffrey J. Kosowsky" <backuppc AT kosowsky DOT org>
To: "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Sun, 13 Dec 2009 23:56:59 -0500
Robin Lee Powell wrote at about 20:18:55 -0800 on Sunday, December 13, 2009:
 > 
 > I've only looked at the code briefly, but I believe this *should* be
 > possible.  I don't know if I'll be implementing it, at least not
 > right away, but it shouldn't actually be that hard, so I wanted to
 > throw it out so someone else could run with it if ey wants.
 > 
 > It's an idea I had about rsync resumption:
 > 
 > Keep an array of all the things you haven't backed up yet, starting
 > with the inital arguments; let's say we're transferring "/a" and
 > "/b" from the remote machine.
 > 
 > Start by putting "a/" and "b/" in the array.  Then get the directory
 > listing for a/, and replace "a/" in the array with "a/d", "a/e", ...
 > for all files and directories in there.  When each file is
 > transferred, it gets removed.  Directories are replaced with their
 > contents.
 > 
 > If the transfer breaks, you can resume with that list of
 > things-what-still-need-transferring/recursing-through without having
 > to walk the parts of the tree you've already walked.
 > 
 > This should solve the SIGPIPE problem.  In fact, it could even deal
 > with incrementals from things like laptops: if you have settings for
 > NumRetries and RetryDelay, you could, say, retry every 60 seconds
 > for a week if you wanted.
 > 
 > On top of that, you could use the same retry system to
 > *significantly* limit the memory usage: stop rsyncing every N files
 > (where N is a config value).  If you only do, say, 1000 files at a
 > time, the memory usage will be very low indeed.
 > 

Unfortunately, I don't think it is that simple. If it were, then rsync
would have been written that way back in version .001. I mean there is
a reason that rsync memory usage increases as the number of files
increases (even in 3.0) and it is not due to memory holes or ignorant
programmers. After all, your proposed fix is not exactly obscure.

At least one reason is the need to keep track of inodes so that hard
links can be copied properly. In fact, I believe that without the -H
flag, rsync memory usage scales much better. Obviously if you break up
backups into smaller chunks or allow resumes without keeping track of
past inodes then you have no way of tracking hard links across the
filesystem. Maybe you don't care but if so, you could probably do just
about as well by dropping the --hard-links argument from RsyncArgs.

I don't believe there is any easy way to get something for free here...

------------------------------------------------------------------------------
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/