BackupPC-users

Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 19:03:48
Subject: Re: [BackupPC-users] Backing up a BackupPC server
From: Holger Parplies <wbppc AT parplies DOT de>
To: Les Mikesell <les AT futuresource DOT com>, "Jeffrey J. Kosowsky" <backuppc AT kosowsky DOT org>
Date: Wed, 3 Jun 2009 00:57:28 +0200
Hi,

Les Mikesell wrote on 2009-06-02 17:32:24 -0500 [Re: [BackupPC-users] Backing 
up a BackupPC server]:
> Jeffrey J. Kosowsky wrote:
> > [...]
> > Once we are talking about redoing things, I would prefer to use a
> > full md5sum hash for the name of the pool file. [...]
> > With this approach then you would automatically have "a common hashed
> > filename that is ['statistically'] unique across all instances for
> > every piece of content."
> 
> Somehow the number of possible different file contents and the number 
> possible md5sums don't seem quite statistically equivalent to me.  And 
> then there's:
> 
> http://www.mscs.dal.ca/~selinger/md5collision/

first of all, if you are *not* using rsync, you *don't* get a *full* md5sum
hash for free or even cheap. You (Jeffrey) know the code well enough to
realize that BackupPC goes to great pains to avoid writing to the pool disk
unless necessary. If you need to transfer the whole file (of arbitrary size)
before you can look up the pool entry, you *have to* write a temporary copy
(probably compressed, too, giving up the benefits you gain from only
compressing once and decompressing when matching). You have to handle
collisions just the same (meaning re-reading your temporary copy and comparing
to the pool file). Yuck.

Yes, you can special-case small files that fit into memory, but yuck just the
same.

If you use a *partial* md5sum, there's no gain from rsync, and you trivially
get collisions just like you do now.

That is not to say, if we end up using a database, that it would not be a good
idea to store the full md5sum in the database. In fact, with a database, file
names would be somewhat arbitrary, and I'd propose keeping them *short* for
the sake of rsync et al. and file lists.

Regards,
Holger

------------------------------------------------------------------------------
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

<Prev in Thread] Current Thread [Next in Thread>