Re: [BackupPC-users] Keeping servers in sync

Holger Parplies wrote at about 19:00:40 +0200 on Tuesday, September 1, 2009:
 > Hi,
 > 
 > Jim Wilcoxson wrote on 2009-08-31 08:08:48 -0400 [Re: [BackupPC-users] 
 > Keeping servers in sync]:
 > > [...]
 > > I did some reading today about BackupPC's storage layout and design.
 > > I haven't finished yet, but one thing stuck out:
 > > 
 > > "BackupPC_link reads the NewFileList written by BackupPC_dump and
 > > inspects each new file in the backup."
 > > 
 > > To speed up incrementals, HashBackup could make use the NewFileList.

 > 1.) Robustness - do you want to trust the contents of files on a file system
 >     and risk missing pool files in your copy, because they are for whatever
 >     reason not listed?
 > 
 > 2.) Completeness - you need to account for pool chain renumbering (and
 >     deletion of pool files). Unless BackupPC_nighly also provides information
 >     on what it changed, you need to traverse the pool anyway.
 > 
 > 3.) Which NewFileList.* files would you want to look at? Presumably those
 >     for all backups, for which you need to copy the pc/host/num/ tree.
 > 
 > 4.) How do you handle trees of backups that are in progress?
 > 

I too would be skeptical of any incremental backup that depended on
tracking and maintaining a list of changes to the pool/pc trees -
since without a full consistency check, errors would continue to
propagate.

It seems like a lot of issues with file-level BackupPC backups (both
full and incremental) could be solved if we had the following:

1. No chain renumbering - either by using the full file md5sum or other
   better hash as the name of the pool file to (statistically)
   eliminate collisions or by using the existing
   scheme with collisions but not renumbering and allowing holes in
   the chain to occur. Note that their are pros/cons to each
   approach.

2. Adding the name of the pool file to the header of the pool
   file. (note that if you ended up using a full file hash in #1, then
   this would have the added benefit of adding a checksum to each pool
   file)

#1 would simplify incrementals since then we wouldn't need to worry
that old links would need to be changed (i.e. renumbered)

#2 would simplify creation of pc tree links since the target of the
link would be recoverable from the file header.

Full backups would then be done as follows:
1. rsync the pool (without the -H)
2. recurse through the pc tree doing the following:
   - copy over files with nlinks = 1 (mostly zero length files and top
         level log/info files)
   - for files with nlinks >1, create a link table by reading in the
         header to determine the pool target
3. On the target machine run through the link table and create the
   links

This should allow the operation to complete in O(n) time without any
need for storing link tables in memory or large table sorting.

Incremental backups would be done similarly with the following
changes:
- In step 2 you need to only run through the parts of the pc tree
  corresponding to backups created since the last backup
- Additionally, you would delete any older pc backup trees that no
  longer occur in the source

(of course this assumes that you haven't mucked with previous backups
by deleting/adding files manually from earlier backups)

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/