BackupPC-users

Re: [BackupPC-users] How to reuse already saved data until an interrupted incr backup

2009-07-19 12:56:17
Subject: Re: [BackupPC-users] How to reuse already saved data until an interrupted incr backup
From: Matthias Meyer <matthias.meyer AT gmx DOT li>
To: backuppc-users AT lists.sourceforge DOT net
Date: Sun, 19 Jul 2009 18:50:20 +0200
Holger Parplies wrote:

> Hi,
> 
> Matthias Meyer wrote on 2009-07-06 18:35:09 +0200 [Re: [BackupPC-users]
> How to reuse already saved data until an interrupted incr backup]:
>> Holger Parplies wrote:
>> > Matthias Meyer wrote on 2009-07-05 21:14:57 +0200 [Re: [BackupPC-users]
>> > How to reuse already saved data until an interrupted incr backup]:
>> > > [...]
>> > > In the next step I will try to move the /new directory to a /<number>
>> > > directory, update the /backups file and run BackupPC_link during the
>> > > DumpPostUserCmd.
>> > > 
>> > > Is there any reason to not do that?
>> > 
>> > [...]
>> > 1. You will have something that looks like a successful backup but
>> > isn't.
>> >    [...] there are basically two possibilities:
>> >    a) The backup view contains all files processed (i.e. found to be
>> >       "same" or transferred). All files beyond the point where it
>> >       failed are missing.
>> >    b) All files beyond the point where it failed are identical to their
>> >       state in the reference backup.
>> >    I presume it's case a [...]
>> 
>> I presume it is case b) because BackupPC will fill an incremental backup
>> with the files from previous backups.
> 
> did you look at the code? Did you test it? Do you know how "filling"
> works? Your remark sounds as if "filling" is magic and it is somehow
> obvious what it does in the case of a corrupt backup (i.e. one not
> resembling what BackupPC itself would generate). It is not. Each backup is
> supposed to contain a full directory tree. If it doesn't, who says
> BackupPC should go looking for missing directories in previous backups?
> It's not as if BackupPC was *expecting* any directories to be missing.

I didn't looked at the code. I assumed that from documentation and how my
backups will be displayed.
The directory of an incremental backup only contains the new files.
Therefore I assume that BackupPC "fill" this on runtime with the other
files from previous backups. Maybee that is wrong.
> 
> The code for generating a backup view is rather complex, but a quick look
> at it suggests that missing directories will not be filled in from earlier
> backups while missing files will.

I believe too.

> But I could be wrong. 
> 
>> > 2. What is run next? That depends on your setup. If it's an incremental
>> > of
>> >    the same (or lower) level, all changes are re-transferred, so you
>> >    gain nothing. You simply end up with a bogus backup in your history.
>> >    On the other hand, if it's a full or an incremental of *higher*
>> >    level (you have a level N backup in your 'backups' file, so BackupPC
>> >    would do a level N+1 if that is what you configured), the backup
>> >    will need to re-transfer *all files you missed on the partial* (at
>> >    least in case a above), meaning you probably lose (much) more than
>> >    you gain, aside from also having a bogus backup in your history.
>> > 
>> I have a level N backup configured.
> 
> I don't understand what you mean by that, but you've said elsewhere that
> you use IncrLevels for doing incrementals of increasing level. That would
> mean that your next backup would be of level N+1 (for a "partial" level N
> backup).

Yes
> 
>> So the next incremental backup will
>> backup all changed files since last (the incomplete incremantal) backup.
>> I understand. With my strategy I will not backup those files which are
>> changed before but not included in the last, incomplete incremental
>> backup.
> 
> That's not what I said, and it's not what will happen. We're talking about
> rsync(d) here, else all of what you are trying to do doesn't make sense
> (you can't reuse previously transferred files with tar or smb, because you
> can't tell tar/smb to backup relative to a timestamp, "except for the
> files I've already got, unless they've changed again"). rsync(d) doesn't
> use a timestamp, only a file list. That file list either contains a file,
> in which case it will be transferred if it has changed, or it doesn't, in
> which case it will be transferred. The point really is that your next
> transfer may be *huge*, because it includes all *unchanged* files that the
> partial missed looking at.

I believe the filelist will be created by compare of timestamps and
attributes (during incr backup) and by compare of content (during full
backup).
But you are right. If I save the filelist all transfered files would be
transfered again. So that can not part of the solution.
> 
>> [...]
>> My basic problem are notebooks which are often only a short time online.
>> If such a notebock have one new big file (around 500MB or above) it often
>> use the short online time to transfer this big file.
> 
> Note 1: if you have a large amount of changes (compared to your share
> size),
>         you should really be doing a full backup.
> Note 2: if your big file is where the backup is interrupted (eg. after
> 499MB
>         have been sent), the file will be deleted in any case. If your
>         incremental is like "one big file and a few bytes here and there",
> it's rather likely to either complete or be interrupted within the
> big file.
> 
>> [...]
>> My "new" strategy:
>> If an incremental abort I will copy the last backup (called #X) into a
>> new backup (called #Y) and merge the partial incremental into it.
>> In the __TOPDIR__/pc/<host>/backups file I duplicate the last line and
>> increment the backup number (to #Y). I will also try to change the
>> text "full" or "incr" to "partialIncr" but I would believe that BackupPC
>> will be surprised about this string.
>> The advantage of the above: The next backup is looking for new files
>> against the timestamp from backup #X and against the files from backup
>> #Y.
> 
> Wrong, see above.
> 
>> The disadvantage: It is possible that backup #Y contains files which are
>> deleted after backup #X on client side.
> 
> That's not a disadvantage, because backup #Y is an inconsistent backup by
> definition. Its only purpose would be speeding up the next backup, not
> being looked at or being used for restores. Bogus files really don't make
> any difference.
> 
> But this is a disadvantage: you are doing a lot of complicated work which
> is bound to mess up things if you don't get it right. Your time would be
> better spent implementing partial incrementals within BackupPC. Again, I'm
> not positive about why BackupPC doesn't do them yet. There are probably
> good reasons. If you feel you need them, there's no reason not to
> implement them (maybe you'll find out that way why BackupPC doesn't do
> them). I doubt that more can go wrong than with what you're suggesting.
> 
> Or, of course, change your backup strategy. It might be as simple as
> manually requesting a full backup whenever you see an incremental not
> completing, though I'm not sure the next scheduled incremental wouldn't
> simply delete the partial full and run as an incremental anyway. Maybe it
> would make sense to have a $Conf{RunFullIfLastIncrementalFailed} which
> would make BackupPC run a full backup next, if the previous incremental
> failed (due to a reason where this makes sense; obviously "no ping"
> doesn't qualify, but that's not a failure anyway, is it?). That way, cases
> like you are describing would re-transfer the big file once, but after
> that they would re-use the partial backup. That doesn't even seem hard to
> implement.

That seems to be a good solution. At least better then my idea.
But it is not possible for me to implement something like this because I am
not a perl programmer :-(
> 
>> If the next backup (called #Z):
>> - is an incremental backup  and abort too, I will only merge the
>> transferred files into backup #Y. If it ends successful I have to merge
>> the backup #Y into backup #Z and delete backup #Y.
>> - is a full backup I would end up with an inconsistent incremental #Y.
>> Therefore I will delete backup #Y after the successful full backup #Z.
>> 
>> My goal, save as much transfered files as possible, should be reached.
>> The disadvantage to have "deleted" files within an backup would be
>> bearable for me.
>> 
>> What do you think about that?
> 
> You are making a lot of assumptions and planning a lot of complicated
> things ("merging backups"), for the purpose of working around local
> problems. The goal of *generally* making BackupPC save as many redundant
> network transfers as possible would be a good one (supposing the cost is
> not too high), but it is not achieved through hacks in post user commands
> involving inconsistent backups in your backup history.
> 
Yes, I agree with you.
A better way would be to find a perl programmer which can implement
a "partial increment backup" as it is implemented for full backup.

br
Matthias
-- 
Don't Panic


------------------------------------------------------------------------------
Enter the BlackBerry Developer Challenge  
This is your chance to win up to $100,000 in prizes! For a limited time, 
vendors submitting new applications to BlackBerry App World(TM) will have
the opportunity to enter the BlackBerry Developer Challenge. See full prize  
details at: http://p.sf.net/sfu/Challenge
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/