Re: [BackupPC-users] How to reuse already saved data until an interrupte

Hi,

Matthias Meyer wrote on 2009-07-06 18:35:09 +0200 [Re: [BackupPC-users] How to 
reuse already saved data until an interrupted incr backup]:
> Holger Parplies wrote:
> > Matthias Meyer wrote on 2009-07-05 21:14:57 +0200 [Re: [BackupPC-users]
> > How to reuse already saved data until an interrupted incr backup]:
> > > [...]
> > > In the next step I will try to move the /new directory to a /<number>
> > > directory, update the /backups file and run BackupPC_link during the
> > > DumpPostUserCmd.
> > > 
> > > Is there any reason to not do that?
> > 
> > [...]
> > 1. You will have something that looks like a successful backup but isn't.
> >    [...] there are basically two possibilities:
> >    a) The backup view contains all files processed (i.e. found to be
> >       "same" or transferred). All files beyond the point where it failed
> >       are missing.
> >    b) All files beyond the point where it failed are identical to their
> >       state in the reference backup.
> >    I presume it's case a [...]
> 
> I presume it is case b) because BackupPC will fill an incremental backup
> with the files from previous backups.

did you look at the code? Did you test it? Do you know how "filling" works?
Your remark sounds as if "filling" is magic and it is somehow obvious what it
does in the case of a corrupt backup (i.e. one not resembling what BackupPC
itself would generate). It is not. Each backup is supposed to contain a full
directory tree. If it doesn't, who says BackupPC should go looking for missing
directories in previous backups? It's not as if BackupPC was *expecting* any
directories to be missing.

The code for generating a backup view is rather complex, but a quick look
at it suggests that missing directories will not be filled in from earlier
backups while missing files will. But I could be wrong.

> > 2. What is run next? That depends on your setup. If it's an incremental of
> >    the same (or lower) level, all changes are re-transferred, so you gain
> >    nothing. You simply end up with a bogus backup in your history.
> >    On the other hand, if it's a full or an incremental of *higher* level
> >    (you have a level N backup in your 'backups' file, so BackupPC would do
> >    a level N+1 if that is what you configured), the backup will need to
> >    re-transfer *all files you missed on the partial* (at least in case a
> >    above), meaning you probably lose (much) more than you gain, aside from
> >    also having a bogus backup in your history.
> > 
> I have a level N backup configured.

I don't understand what you mean by that, but you've said elsewhere that you
use IncrLevels for doing incrementals of increasing level. That would mean
that your next backup would be of level N+1 (for a "partial" level N backup).

> So the next incremental backup will
> backup all changed files since last (the incomplete incremantal) backup.
> I understand. With my strategy I will not backup those files which are
> changed before but not included in the last, incomplete incremental backup.

That's not what I said, and it's not what will happen. We're talking about
rsync(d) here, else all of what you are trying to do doesn't make sense (you
can't reuse previously transferred files with tar or smb, because you can't
tell tar/smb to backup relative to a timestamp, "except for the files I've
already got, unless they've changed again"). rsync(d) doesn't use a timestamp,
only a file list. That file list either contains a file, in which case it will
be transferred if it has changed, or it doesn't, in which case it will be
transferred. The point really is that your next transfer may be *huge*,
because it includes all *unchanged* files that the partial missed looking at.

> [...]
> My basic problem are notebooks which are often only a short time online.
> If such a notebock have one new big file (around 500MB or above) it often
> use the short online time to transfer this big file.

Note 1: if you have a large amount of changes (compared to your share size),
        you should really be doing a full backup.
Note 2: if your big file is where the backup is interrupted (eg. after 499MB
        have been sent), the file will be deleted in any case. If your
        incremental is like "one big file and a few bytes here and there",
        it's rather likely to either complete or be interrupted within the
        big file.

> [...]
> My "new" strategy:
> If an incremental abort I will copy the last backup (called #X) into a new
> backup (called #Y) and merge the partial incremental into it.
> In the __TOPDIR__/pc/<host>/backups file I duplicate the last line and
> increment the backup number (to #Y). I will also try to change the
> text "full" or "incr" to "partialIncr" but I would believe that BackupPC
> will be surprised about this string.
> The advantage of the above: The next backup is looking for new files against
> the timestamp from backup #X and against the files from backup #Y.

Wrong, see above.

> The disadvantage: It is possible that backup #Y contains files which are
> deleted after backup #X on client side.

That's not a disadvantage, because backup #Y is an inconsistent backup by
definition. Its only purpose would be speeding up the next backup, not being
looked at or being used for restores. Bogus files really don't make any
difference.

But this is a disadvantage: you are doing a lot of complicated work which is
bound to mess up things if you don't get it right. Your time would be better
spent implementing partial incrementals within BackupPC. Again, I'm not
positive about why BackupPC doesn't do them yet. There are probably good
reasons. If you feel you need them, there's no reason not to implement them
(maybe you'll find out that way why BackupPC doesn't do them). I doubt that
more can go wrong than with what you're suggesting.

Or, of course, change your backup strategy. It might be as simple as manually
requesting a full backup whenever you see an incremental not completing,
though I'm not sure the next scheduled incremental wouldn't simply delete the
partial full and run as an incremental anyway. Maybe it would make sense to
have a $Conf{RunFullIfLastIncrementalFailed} which would make BackupPC run a
full backup next, if the previous incremental failed (due to a reason where
this makes sense; obviously "no ping" doesn't qualify, but that's not a
failure anyway, is it?). That way, cases like you are describing would
re-transfer the big file once, but after that they would re-use the partial
backup. That doesn't even seem hard to implement.

> If the next backup (called #Z):
> - is an incremental backup  and abort too, I will only merge the transferred
> files into backup #Y. If it ends successful I have to merge the backup #Y
> into backup #Z and delete backup #Y.
> - is a full backup I would end up with an inconsistent incremental #Y.
> Therefore I will delete backup #Y after the successful full backup #Z.
> 
> My goal, save as much transfered files as possible, should be reached.
> The disadvantage to have "deleted" files within an backup would be bearable
> for me.
> 
> What do you think about that?

You are making a lot of assumptions and planning a lot of complicated things
("merging backups"), for the purpose of working around local problems. The
goal of *generally* making BackupPC save as many redundant network transfers
as possible would be a good one (supposing the cost is not too high), but it
is not achieved through hacks in post user commands involving inconsistent
backups in your backup history.

Regards,
Holger

------------------------------------------------------------------------------
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] How to reuse already saved data until an interrupted incr backup