BackupPC-users

[BackupPC-users] Idea for adding md5sums and reverse-pool lookup to BackupPC

2010-12-19 13:26:48
Subject: [BackupPC-users] Idea for adding md5sums and reverse-pool lookup to BackupPC
From: "Jeffrey J. Kosowsky" <backuppc AT kosowsky DOT org>
To: General list for user discussion <backuppc-users AT lists.sourceforge DOT net>
Date: Sun, 19 Dec 2010 13:24:23 -0500
Here is an idea for adding md5sum consistency checks along with a
rough ability to reverse-lookup the pc tree entries that are hard
linked to any given pool entry. (It probably is not worth coding if
ver 4.0 is near since all this functionality reportedly will be built
into the new version)

1. Create new trees called say 'md5sum' and 'cmd5sum' parallel to the
   pool and cpool directory trees
2. Whenever a new file is *added* to the pool/cpool, calculate the full
   file md5sum and create a new file with the same partial md5sum name
   (and chain numbering) in the corresponding md5sum/cmd5sum tree with
   first line containing the full file md5sum
3. Whenever a (new) pc file is linked/copied to the pool append the pc
   file path (starting from TopDir) to the corresponding file in the
   md5sum/cmd5sum tree
4. Whenever a pool file is deleted or a chain is renumbered which I believe
   only happens during BackupPC_nightly (and of course also in my
   BackupPC_fixLinks script), do the corresponding renumbering on the
   parallel md5sum/cmd5sum entry

Now you can go:
A. pc tree entry -> pool entry (this is in general a many to one mapping)
   Calculate partial file md5sum and find the entry in the
   corresponding pool/cpool (if there is a chain then choose the chain
   element with the same inode).

   This is relatively fast since you only need to read approximately
   the first MB (and match the inode number if there is a chain)

B. pool entry -> pc entries (this is in general a one to many mapping)
   Lookup corresponding entry in the md5sum/cd5sum tree and look at
   the lines starting after the first md5sum line

C. Check pool (and thus indirectly pc chain validity).
   Compare md5sums of pool/cpool entries with the first line of the
   corresponding entrie in the md5sum/cmd5sum tree.

   This is fast since the number of lines in each entry is o(#backups).

Note when backups are deleted, one could theoretically go through the
md5su/cmd5sum trees and delete the corresponding entries for each
deleted pc tree file. However, this is quite expensive since not only
would you need to traverse the entire deleted backup tree, but you
would have to calculate the partial file md5sums to figure out where
it lies in the md5sum/cmd5sum tree. But there is really very little
downside to not deleting the entries from the md5sum/cmd5sum tree
since at worse, we have some entries that no longer have corresponding
pc tree entries. And even if you delete the last backup and then a new
backup with the same number gets created, you know that the last
matching entry is the valid one.

------------------------------------------------------------------------------
Lotusphere 2011
Register now for Lotusphere 2011 and learn how
to connect the dots, take your collaborative environment
to the next level, and enter the era of Social Business.
http://p.sf.net/sfu/lotusphere-d2d
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

<Prev in Thread] Current Thread [Next in Thread>
  • [BackupPC-users] Idea for adding md5sums and reverse-pool lookup to BackupPC, Jeffrey J. Kosowsky <=