On Mon, Dec 14, 2009 at 02:08:31PM -0500, Jeffrey J. Kosowsky wrote:
> Robin Lee Powell wrote at about 10:10:17 -0800 on Monday, December 14, 2009:
> > Do you actually see a *problem* with it, or are you just
> > assuming it won't work because it seems too easy?
>
> The problem I see is that backuppc won't be able to backup hard
> links on any interrupted or sub-divided backup unless you are
> careful to make sure that no hard links span multiple restarts.
> And once you mess up hard links for a file, all subsequent
> incremental will be unlinked to.
>
>
> If you are just using BackupPC to back up data then that might not
> be important. On the other hand, if you are using backuppc to
> backup entire systems with the goal of having (close to a) bare
> metal restore, then this method won't work.
Agreed on both counts; I'm only interested in backing up data.
Obviously such a system would have to be optional.
> Personally, I haven't seen a major memory sink using rsync 3.0+.
> Perhaps you could provide some real world data of the potential
> savings so that people can understand the tradeoffs.
>
> That being said, memory is pretty cheap, while reliable backups
> are hard.
I'm *far* more worried about the reliability than the RAM usage;
that was just a side effect. I'm losing 10+ hour backups routinely
to SIGPIPE and rsync on the remote and dying and so on; *that* is
what the idea was designed to fix. The whole point is to,
optionally, make rsync more reliable at the expense of losing
hardlink support and, tangentially, save some RAM.
> As an aside, if anything, myself and others have been pushing to
> get more reliable backup of filesystem details such as extended
> attributes, ACLs, ntfs stuff etc. and removing the ability to
> backup hard links would be a step backwards from that perspective.
Understood.
> Finally, the problem with interrupted backups that I see mentioned
> most on this group is the interruption of large transfers that
> have to be restarted and then retransferred over a slow link.
> Rsync itself is pretty fast when it just has to check file
> attributes to determine what needs to be backed up.
Not with large trees it isn't. I have 3.5 million files, and more
than 300GiB of data, in one file system. The last incremental took
*twenty one hours*. I have another backup that's 4.5 million files,
also more than 300 GiB of data, also in one file system. The full
took 20 hours; it hasn't succeeded at an incremental yet. That's
over full 100BaseT, if not better (I'm not the networking person).
Asking rsync, and ssh, and a pair of firewalls and load balancers
(it's complicated) to stay perfectly fine for almost a full day is
really asking a whole hell of a lot. For large data sets like this,
rsync simple isn't robust enough by itself. Losing 15 hours worth
of (BackupPC's) work because the ssh connection goes down is
*really* frustrating.
In both cases, the client-side rsync uses more than 300MiB of RAM,
with --hard-links *removed* from the rsync option list. Not
devestating, but not trivial either.
> So, I think the best way for improvement that would be consistent
> with BackupPC design would be to store partial file transfers so
> that they could be resumed on interruption. Also, people have
> suggested tweaks to the algorithm for storing partial backups.
Partial transfers won't help in the slightest: the cost is the time
it takes to walk the file tree, which is what my idea was designed
to avoid: re-walking the tree on resumption.
Having said that, if incrementals could be resumed instead of just
thrown away, that would at least be marginally less frustrating when
a minor network glitch loses a 15+ hour transfer.
In the incremental I mentioned above, rsync's MB/sec listing is
0.08. Over 100BaseT. Seriously: the problem is that walking file
trees of that size, when they are active serving production traffic,
takes a *really* long time. I don't see any way to avoid that
besides keeping track of where you've been.
-Robin
--
They say: "The first AIs will be built by the military as weapons."
And I'm thinking: "Does it even occur to you to try for something
other than the default outcome?" See http://shrunklink.com/cdiz
http://www.digitalkingdom.org/~rlpowell/ *** http://www.lojban.org/
------------------------------------------------------------------------------
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/
|