BackupPC-users

Re: [BackupPC-users] How to delete backups? + Benefit of incremental backups?

2009-01-08 21:31:11
Subject: Re: [BackupPC-users] How to delete backups? + Benefit of incremental backups?
From: Holger Parplies <wbppc AT parplies DOT de>
To: Thomas Birnthaler <tb AT ostc DOT de>, Les Mikesell <les AT futuresource DOT com>
Date: Fri, 9 Jan 2009 03:28:33 +0100
Hi,

Thomas Birnthaler wrote on 2009-01-09 00:00:53 +0100 [[BackupPC-users] How to 
delete backups? + Benefit of incremental backups?]:
> [...]
> 1. We wonder why neither the GUI nor the commands offer a possibility to 
> delete backups (full and incremental). If e.g. the backup partition is filled 
> up, this would be very helpful.

it's usually not necessary (because BackupPC deletes backups when they expire)
or desired (until they expire, they should not be deleted). Furthermore, it's
a "dangerous" operation. I would argue against making that available with one
or two clicks on a web page.

> [...]
> 2. What is real benefit of incremental backups compared to full backups?

They are, in general (and if used correctly) significantly faster. They put
less strain on server and client disks. This has been answered many times.

Tino Schwarze wrote on 2009-01-09 01:17:21 +0100 [Re: [BackupPC-users] How to 
delete backups? + Benefit of incremental backups?]:
> > I argue they are just missing in the respective backup hardlink tree.
> > So if you delete the backup hardlink tree, the "holes" disappear?
> 
> There are a lot more cases to consider. I wouldn't risk getting an
> inconsistend view of the backups... see below.

The most important one, as a simple example:

First full backup contains only file 'a'.
Second full backup contains files 'a' and 'b'.
Incremental after second full contains only file 'c' (meaning the target
directory actually contained files 'a', 'b' and 'c').

Now remove the second full backup. Your incremental will now be interpreted as
containing files 'a' and 'c'. File 'b' has vanished. Note that this is a state
in which your original file system has been *at no point in time*. You not
only lose data, you get an inconsistent state.

Future rsync incrementals *should* fix that (or fail - not sure), because they
would be run against the contents of the reference backup. If BackupPC finds
the first full backup as a reference (again: not sure if that happens), then
all changes would be transferred. My guess is that if that does not happen,
the incremental would be done against an empty tree, thus transferring
*everything* (which is more than a full would do).
tar/smb incrementals are only based on a timestamp. Again, I'm not sure what
would be considered as the reference backup (since the real reference has
disappeared). If, somehow, the *time stamp* of the deleted backup were used,
the problem would *not* be fixed. File 'b' wasn't modified since the time
stamp - no reason to back it up.

In short: don't do it. What is the point of removing comparatively "young"
full backups (that have incrementals depending upon them) anyway? Wouldn't you
remove the *oldest* backups? Or *correctly* use the exponential strategy
BackupPC provides?

Someone in this thread wrote that BackupPC takes care of deleting dependant
incrementals with the expiring full. That is *not* correct. BackupPC actually
delays deleting the full backup until it is no longer needed, because there
are no dependant backups left. This is the sensible thing to do. Remove the
dependant incrementals rather than the full backup. Following your own
argumentation, removing an incremental will free just as much space as
removing a full would [well, that's not true, but you get the idea].

> > But this is just management space (directories and their dentries) ---
> > which uses of course some inode and data space. No additional data space
> > and inode space is needed for the real files in full backups compared to
> > a incremental backup. Is that correct?
> 
> Yes.

Almost. All *directories* are created for an incremental anyway (not only
directories containing changes). Without the entries for unchanged files, the
directories will be shorter, same number of inodes, though.
If the maximum hard link count is exceeded for a file, you will get a new
copy of it in the pool. While 32000 possible links is *a lot*, you still may
get overflows, and you'll get them sooner with only full backups. Just a
detail.

> What are you trying to achieve, after all? Please tell us about the
> problem you're trying to solve - there might be easier approaches.

This question is *important*. All of the points made here have been made *a
lot* of times. You are trying to do something that doesn't seem to make sense,
but, presumably, the problem you are trying to solve *does* exist. I would
rather help you solve a problem than criticize your questions.

> > > > We have also detected, that in some cases incremental backups need much
> > > > more time than full backups (factor 3-5) This sounds odd to us.
> > >
> > > What transfer method are you using?
> >
> > rsync over GBit networks between Linux machines and also between MacOS 
> > machines. In both cases that effect happens.
> 
> That's strange.

No, it's not. Not necessarily, anyway.

Les Mikesell wrote on 2009-01-08 17:52:10 -0600 [Re: [BackupPC-users] How to 
delete backups? + Benefit of incremental backups?]:
> [...]
> This sounds like you are using rsync and doing infrequent fulls. 
> Normally rsync incrementals transfer everything that has changed since 
> the last full which is the comparison base.  Files added after the full 
> are copied over again in every subsequent incremental.

If you have a lot of changing data and little static data, incrementals will
re-transfer the data, only sparing what is unchanged since the last full.

> If you have 3.x you can change this with $Conf{IncrLevels}.

... by creating intermediate reference points. Doing full backups more often
is much simpler, though, if you can take the performance impact.

> Hmmm, it still doesn't make sense that a full would be faster than the 
> prior incremental, though.

Yes, it does. The rsync(d) full is done relative to the preceeding backup,
the *incremental* in this case, so you may be transferring significantly less
data.


So, yes, it may be odd, but it may also be expected, depending on your amount
of data and changes to that data, server and client disk and CPU speed. There
are several potential bottlenecks. Even network speed may be one, if you've
got ssh on a slow processor in there somewhere ...

Regards,
Holger

------------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It is the best place to buy or sell services for
just about anything Open Source.
http://p.sf.net/sfu/Xq1LFB
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/