Re: [BackupPC-users] Copying in a file instead of backing up?

Les Mikesell wrote at about 01:11:06 -0600 on Wednesday, January 14, 2009:
 > Johan Ehnberg wrote:
 > >> OK. I can see now why this is true. But it seems like one could
 > >> rewrite the backuppc rsync protocol to check the pool for a file with
 > >> same checksum  before syncing. This could give some real speedup on
 > >> long files. This would be possible at least for the cpool where the
 > >> rsync checksums (and full file checksums) are stored at the end of
 > >> each file.
 > > 
 > > Now this would be quite the feature - and it fits perfecty with the idea 
 > > of smart pooling that BackupPC has. The effects are rather interesting:
 > > 
 > > - Different incremental levels won't be needed to preserve bandwidth
 > > - Full backups will indirectly use earlier incrementals as reference
 > > 
 > > Definite whishlist item.
 > 
 > But you'll have to read through millions of files and the common case of 
 > a growing logfile isn't going to find a match anyway.  The only way this 
 > could work is if the remote rsync could send a starting hash matching 
 > the one used to construct the pool filenames - and then you still have 
 > to deal with the odds of collisions.
 > 

First, I agree that this is not necessarily easy and would probably
require some significant changes to the design of how the pool files
are named and structured.

However, collisions are pretty easy to deal with.
Also, my suggestion was to do this on a selective basis -- say "large"
files when backing up over a slow link, so it would not necessarily
involve "millions" of files.

Finally, I don't know much at all about the inner workings of
rsync. But the above might be possible if rsync allowed you to
calculate the checksums first before initiating transfer. If that were
true, then it might not be hard to have a corresponding process on the
BackupPC server check the checksum against the existing pool before
deciding to proceed with the data transfer. The big problem would be
that the partial file md5sum used to define pool file names is not
consistent with the rsync checksum calculations -- which all goes back
to my thinking that long-term, a relational database is a better
structure for storing backup information than the "kludge" of
hard-links plus attrib files. In this case, relational database, would
allow for pool lookup based on rsync mdsums (as well as the existing
partial md5sums).

------------------------------------------------------------------------------
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/