BackupPC-users

Re: [BackupPC-users] Backing up a BackupPC server

2009-06-03 00:38:38
Subject: Re: [BackupPC-users] Backing up a BackupPC server
From: "Jeffrey J. Kosowsky" <backuppc AT kosowsky DOT org>
To: "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Wed, 03 Jun 2009 00:33:48 -0400
Les Mikesell wrote at about 17:32:24 -0500 on Tuesday, June 2, 2009:
 > Jeffrey J. Kosowsky wrote:
 > >  > 
 > >  > Backing up other backuppc servers is really a special case that might 
 > >  > deserve a special optimization.   But, I'm not sure that adding a 
 > >  > database automatically makes it any easier - unless you are thinking of 
 > >  > a common database that could arbitrate a common hashed filename that is 
 > >  > unique across all instances for every piece of content.  That's an 
 > >  > interesting idea but seems kind of fragile.
 > >  > 
 > > 
 > > Once we are talking about redoing things, I would prefer to use a
 > > full md5sum hash for the name of the pool file. You end up
 > > calculating this anyway for free when you use the rsync method
 > > (although with protocol <=28, you get a full file md4sum but with
 > > protocol >=30, I believe you have the true md5sum). This would
 > > simplify the ambiguity of having multiple indexed chain entries with
 > > the same partial md5sum.
 > > 
 > > With this approach then you would automatically have "a common hashed
 > > filename that is ['statistically'] unique across all instances for
 > > every piece of content."
 > 
 > Somehow the number of possible different file contents and the number 
 > possible md5sums don't seem quite statistically equivalent to me.  And 
 > then there's:
 > 
 > http://www.mscs.dal.ca/~selinger/md5collision/
 > 

That's the whole point. md5sum collisions are exceedingly rare with
any imaginable number of files since there are 2^128 different md5sums
- so even if you have billions of files, the chance of a collision
is infinitesimal. Suppose you have 1 trillion (unique) files that is
just less than 2^40, which means that the chance of at least one
collision is approximately 1- e^(-2^40 * (2^40-1)/2^129) ~ 2^(-49)
which is less than 1 in 500 trillion [this is just a generalization of
the birthday problem]. If you have "only" 1 billion *unique* files
then the chance of at least one collision is less than 2^(-55) which
is less than 1 in 36 quadrillion.

Yes there are some known examples of md5sum collisions but they are
all artificial. I don't believe anyone has ever "accidentally" come
across one in a real world situation. In fact since digital signatures
rely on statistics like this if md5sum collisions were even remotely
possible in real life, the whole electronic financial system would be
unreliable.

Hence, I stand by my statement that in any currently conceivable
BackupPC situation, the md5sums are "statistically" unique.

------------------------------------------------------------------------------
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

<Prev in Thread] Current Thread [Next in Thread>