BackupPC-users

Re: [BackupPC-users] Concrete proposal for feature extension for pooling

2010-03-04 02:52:23
Subject: Re: [BackupPC-users] Concrete proposal for feature extension for pooling
From: Craig Barratt <cbarratt AT users.sourceforge DOT net>
To: "Jeffrey J. Kosowsky" <backuppc AT kosowsky DOT org>
Date: Wed, 3 Mar 2010 23:38:46 -0800
Jeffrey writes:

> Sounds like a lot of work though and kudos to you to committing to
> such an extensive upgrade -- are you going to have the time and
> resources to get this done in 2010 (or 2011 or 2012 ;)?

Not sure - hopefully this year.

> One more question if you don't mind...
> Can you give an outline of how expiry works in the absence of
> hard-links? Specifically, without walking through the whole pc tree
> each night and tabulating all the md5sum references, how do you know
> when a pool file no longer has any remaining pc-tree references and
> therefore can be deleted from the pool? And if you do walk through the
> pool, can you do anything better than an O(n log n) type sort of the
> resulting md5sum lists that you would presumably construct?

This isn't fully thought out, but here goes:

 - any changes to a pool file reference (eg: (digest, +1) or
   (digest, -1)) will be send to a new daemon that writes them
   in batches to a journal file.

 - the BackupPC_nightly process will read the journal files
   and update the reference counts on a daily basis.  This
   should be relatively fast since it involves processing
   a small number of large files.

 - To be safe, any abnormal/unclean shutdown requires a full
   regeneration of the reference counts by reading every attrib
   file in the backup tree.  The effort required here is reading
   one file per directory in every backup.  It could be accelerated
   by caching the reference counts for each backup.

Avoiding race conditions between removing unused pool files and adding
new references to pool files can be accomplished in several ways:

 - don't run BackupPC_nightly until there are no other backups
   running (like the old 2.x days).  Then files can be purged by
   BackupPC_nightly when their final reference count goes to 0.

 - however, I would prefer BackupPC_nightly be run concurrently with
   backups.  I haven't fully thought this through, and I probably
   can't explain it clearly, but I'm thinking of a 2 phase approach.
   BackupPC_nightly updates the reference counts, and flags files with
   reference count 0 (eg: with a permissions bit) but doesn't delete
   them.  Think of it as a "pending delete" flag.  A pool file with
   the flag set is not matched so no new references will be added, but
   there might have been some added before BackupPC_nightly set the
   flag.  The next night, BackupPC_nightly updates the reference
   counts and removes files whose flag is set and reference count is
   still 0. If the reference count is > 0 the flag is reset.  Files
   that are newly at 0 have the flag set and the process repeats.
   Basically this means it takes another day to delete unused pool
   files.  This works if all backups complete within 24 hours, so
   some extra housekeeping is needed if backups exceed 24 hours.

Note that the pool never has to be traversed, even in the fsck case.

Craig

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/