BackupPC-users

Re: [BackupPC-users] Bug#497888: backuppc: please make use of the rsync algorithm, particularly in resuming interrupted backups

2008-09-13 18:59:14
Subject: Re: [BackupPC-users] Bug#497888: backuppc: please make use of the rsync algorithm, particularly in resuming interrupted backups
From: Holger Parplies <wbppc AT parplies DOT de>
To: Tim Connors <reportbug AT rather.puzzling DOT org>
Date: Sun, 14 Sep 2008 00:57:50 +0200
Hi,

[the message I am responding to has yet to hit <backuppc-users> - I'd guess
 it's awaiting moderation]

Tim Connors wrote on 2008-09-14 02:12:16 +1000 [Re: [BackupPC-users] 
Bug#497888: backuppc: please make use of the rsync algorithm, particularly in 
resuming interrupted backups]:
> [...]
> By the way, for the backuppc-users people, the rest of this bug report was
> filed as a debian bug and is viewable here in complete form:
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=497888.html

yes, that was evident from the e-mail-header. I've removed the bug-tracker
from the "cc", because it's a long discussion about something I can't find
to be a bug. I'll submit a pointer instead.

> > [...] I believe BackupPC *does* remove the in-progress file at the time
> > the backup failed. This is logical in that you don't have a [partial] backup
> > that reflects an incorrect state of a file (i.e. if it's there it's 
> > correct) -
> > a random file from the user's point of view, as I think should be pointed 
> > out.
> 
> However, previous backups are in the pool as
>  /var/lib/backuppc/pc/<hostname>/<n>
> (and hardlinked into the common pool /var/lib/backuppc/cpool/)
> whereas the incompeltely transferred filesystem is in
>  /var/lib/backuppc/pc/<hostname>/new
> 
> Of course you wouldn't serve up the incompletely transferred filesystem as
> a valid backup.  Of course you would complete that transfer before
> renaming it and linking it to the preexisting pool.  Subsequent resumes of
> such a backup would naturally be done with the --inplace --delete flag, so
> that the remote rsync could tell the backuppc process which files had been
> deleted in the meantime, and --inplace would take care of these large but
> incompletely transferred files.  Or instead of --inplace, since backuppc
> is acting as the rsync client, at least record what temporary file you
> used so that you can just get the delta.

And so on. Your patch being?

Please stick to feature suggestions. As long as your idea is not a good one,
it's quite pointless to argue about why it can't be done (which involves
clearing up misconceptions about how BackupPC works).

I would summarize your suggestion as:

        BackupPC should keep the in-progress file when creating a partial
        backup, in order to speed up the retry.

I would like to point out, though, that this does not seem to be your problem.
Partial backups do not seem to work for you as they do for others.


So, first of all: What happens with partial backups?

I admit I don't get partial backups. I'm a user, not a developer, of BackupPC.
I don't have a test setup where I could easily try it out. But I can read Perl
code (and make mistakes reading it, yes). I interpret the code in BackupPC_dump
to, in fact, save a partial as a numbered backup (hostname/<n>). If this does
not happen for you, something is going wrong, because hostname/new is deleted
*before* the next backup, *not* used as a reference, meaning you don't, in
fact, *have* a partial backup.

This would also mean that partial backups are and should be browsable.
If I am wrong, I hope someone will correct me.

And I maintain that you don't understand "--inplace" correctly. Look at
rsync(1) under "--partial". "--inplace" might imply "--partial", but only
because there is no way to "undo the damage". If I were rsync, I'd delete an
incomplete *new* file with "--inplace", unless "--partial" was also specified.
Just my opinion - your rsync may disagree.

> So if I do a full backup of all of the filesystem on mum's computer, it
> will get to the 4th filesystem with this problematic mbox file, fall over
> at that point when she turns the computer off in 12 hours, and then the
> next full backup would start from this file, and not have to do the rest
> of the backup of the other 3 filesystems first (only to delete them all
> later when it realises it can't complete this backup)?  Most likely, it
> would still have to start from the start of this file though, because the
> first time around, it's still got to get all of those 650MB across the net
> as it doesn't yet have an earlier version to work from.  Which means it's
> still going to fail some 400MB or so into the transfer, and I'll incur
> another 400MB cost to mum's quota.

Use conventional "rsync --partial" to iteratively get a full local copy of the
complete backup set. Use $Conf{ClientAlias} and some trick to get the paths
consistent (maybe rsyncd modules with appropriate names, or set up a virtual
host ...) to point your mother's backup configuration at your local copy. Do
an initial backup from your local copy. Point BackupPC back at your mother's
computer. Now an incremental (or full) should work as expected.

Alternatively, use $Conf{BackupFilesExclude} to get a full backup completed by
excluding enough files for it to complete fast enough. Gradually remove things
from BackupFilesExclude (to *in*clude them in the backup). Do full backups
until BackupFilesExclude is empty (or only contains what you really want to
exclude) and you have a complete full backup.

Third possibility: find out why it is not working as it should (i.e. why you
don't get a valid partial) and fix that.

Only the first suggestion will avoid needing to transfer the whole 650MB file
in one run.

> [...]
> For tar etc, you're still not transferring the whole filesystem in an
> incremental run - you're just asking it to transfer new files, aren't you?
> So you can still compare them against the merged view when working out
> what needs to be transferred (not that you'd use tar across the internet).

No, "I" can only tell tar from which timestamp on to transfer files. Partial
tar/smb backups are not useful for saving any bandwidth.

> I'd be wondering what the point of an incremental is over a full backup
> in the rsync case, if bandwidth is maximised and you still get the benefit
> of pooling files with full backups.

Ask your HDDs and read

http://sourceforge.net/mailarchive/message.php?msg_name=48BD90E0.401 AT knebb 
DOT de
http://www.mail-archive.com/backuppc-users AT lists.sourceforge DOT 
net/msg06101.html

(thanks to Rob Owens for providing the second link).

> But until I can get the first transfer done after a major filesystem
> reorganisation on my mum's computer, I won't know!

True, major filesystem reorganisations are a pain, but there is no simple way
to solve that problem without BackupPC-specific client software (which would
be BackupPCd, which is, reportedly, dead).

Regards,
Holger

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

<Prev in Thread] Current Thread [Next in Thread>