BackupPC-users

Re: [BackupPC-users] improving the deduplication ratio

2008-04-16 08:20:56
Subject: Re: [BackupPC-users] improving the deduplication ratio
From: Ludovic Drolez <ldrolez AT debian DOT org>
To: Michael Barrow <michael AT michaelbarrow DOT name>
Date: Wed, 16 Apr 2008 14:20:36 +0200
On Mon, Apr 14, 2008 at 02:31:30PM -0700, Michael Barrow wrote:
> > Introducing file chunking would introduce a new abstraction layer - a
> > file would need to be split into chunks and recreated for restore. You
> 
> 
> Tino -- thanks for posting this. These issues are exactly what I had  
> in mind when I posted about adding sub-file deduplication. There's a  
> lot more work to do and definitely a bunch more housekeeping. Right  
> now, BackupPC gets off "easy" by utilizing hardlinks to do the  
> dedupe. Once we delve below the file, a brand new data structure/ 
> mechanism needs to be designed and built to efficiently link all of  
> these blocks together.

And what about a mix of the two ?
- keep hard links for files less than the chunk size (filenames begin
with an 'f' as before)
- for files bigger than the chunk size, create a regular file which
contains references to the chunks in the cpool (the files could begin
with an 'r' for example).

It would be backward compatible, it would allow progressive pool
conversion to the new 'chunk based' algorithm, and would allow to
enable it only for some hosts.
It would allow us to experiment with this new feature, from what I have
seen in commercial products, I believe that sub-file deduplication
would make incrementals 10 or 100 times smaller.

Cheers,

-- 
Ludovic Drolez.

http://www.palmopensource.com               - The PalmOS Open Source Portal
http://www.drolez.com      - Personal site - Linux, Zaurus and PalmOS stuff

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/