BackupPC-users

Re: [BackupPC-users] Copying in a file instead of backing up?

2009-01-14 13:15:46
Subject: Re: [BackupPC-users] Copying in a file instead of backing up?
From: Rich Rauenzahn <rich AT shroop DOT net>
To: "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Wed, 14 Jan 2009 10:13:27 -0800


Les Mikesell wrote:
Johan Ehnberg wrote:
  
OK. I can see now why this is true. But it seems like one could
rewrite the backuppc rsync protocol to check the pool for a file with
same checksum  before syncing. This could give some real speedup on
long files. This would be possible at least for the cpool where the
rsync checksums (and full file checksums) are stored at the end of
each file.
      
Now this would be quite the feature - and it fits perfecty with the idea 
of smart pooling that BackupPC has. The effects are rather interesting:

- Different incremental levels won't be needed to preserve bandwidth
- Full backups will indirectly use earlier incrementals as reference

Definite whishlist item.
    
But you'll have to read through millions of files and the common case of 
a growing logfile isn't going to find a match anyway.  The only way this 
could work is if the remote rsync could send a starting hash matching 
the one used to construct the pool filenames - and then you still have 
to deal with the odds of collisions

I thought about this a little a year or so ago -- enough to attempt to try to understand the rsync perl modules (failed!).

I thought perhaps what would be best is a berkeley db/tied hash lookup table/cache that would map rsync checksums+file size to a pool item.    The local rsync client would request the checksum of each remote file before transfer, and if it was in the cache and in the pool, it could be used as the local version, then let the rsync protocol take over to verify all of the blocks.

I really like that BackupPC doesn't store its data in a database that could get corrupted, and the berkeley db would just be a cache whose integrity wouldn't be critical to the integrity of the backups.   And the cache isn't relied on 100%, but rather the actual pool file the cache points to is used as the ultimate authority.

Rich


------------------------------------------------------------------------------
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/