BackupPC-users

Re: [BackupPC-users] Backing up a BackupPC server

2009-06-04 09:14:48
Subject: Re: [BackupPC-users] Backing up a BackupPC server
From: "Jeffrey J. Kosowsky" <backuppc AT kosowsky DOT org>
To: "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Thu, 04 Jun 2009 09:04:39 -0400
Tino Schwarze wrote at about 11:35:46 +0200 on Thursday, June 4, 2009:
 > Hi there,
 > 
 > (I already felt like I was going to look dumb or anxious by writing what
 > I wrote...)
 > 
 > On Wed, Jun 03, 2009 at 01:09:38PM -0400, Jeffrey J. Kosowsky wrote:
 > > Tino Schwarze wrote at about 18:39:26 +0200 on Wednesday, June 3, 2009:
 > >  > > > I recently heard about lessfs, which runs on top of FUSE to provide
 > >  > > > a file system that does block-level de-duplication.  See:
 > >  > > > 
 > >  > > >     http://www.lessfs.com
 > >  > > >     https://sourceforge.net/project/showfiles.php?group_id=257120
 > >  > > >     http://tokyocabinet.sourceforge.net/index.html
 > >  > > > 
 > >  > > > The actual storage is several very large (sparse?) files on any
 > >  > > > file system(s) of your choice.  It should provide all the benefits
 > >  > > > you expect: no issues of local limitations on hardlink counts,
 > >  > > > meta-data etc, and the database files can be copied or rsynced.
 > >  > > > I'm corresponding with the author to see if some additional useful
 > >  > > > features could be added.
 > >  > 
 > >  > Well, we've already got MD4 checksums of file blocks. And if I
 > >  > understand everything correctly, we DO GET collisions, therefore the
 > >  > hash chains.
 > > 
 > > First, the hash chains are based on *partial* file *md5* (not md4)
 > > sums.
 > > 
 > > Second, the collisions only occur because the hash is only done on the 
 > > first
 > > and eighth (or last for small files) 128K block. So, obviously you will
 > > have collisions for large files that have the same first and eighth
 > > block. 
 > 
 > That was the first flaw of my thoughts... So I would have to scan my
 > pool and compare first and eigth 128k block (e.g. 0-128k and 1M-1M128k
 > or is it 896k-1M?) for matches? Maybe I'll try that, out of sheer
 > curiousity (if I find the time to script it).
 > 
 > >  > Of course, this if for 256k blocks, IIRC. And "only" 128 bit hashes.
 > >  > But I don't like the idea of relying on probabilities. I've got enough
 > >  > uncertainties by flaky hardware, bugs etc.
 > > 
 > > We rely on probabilities in all aspects of life. Nothing is certain.
 > 
 > I know that. Sometimes I'm paranoid - I just like to get rid of
 > probabilities (=uncertainties) where possible. 

But that is what I mean by you're not understanding
probability. If you believe in math and physics, then EVERYTHING in
life is a probability. There is a real probability that anything will
break (including yourself) at any moment. For electronic devices such
as hard drives this probability is well-modeled for the most common
failures. According to quantum mechanics there is a (truly
infinitesimal) probability that you will simultaneously transform into
a monkey.

The point is that your statement "I just like to get rid of
probabilities (=uncertainties) where possible" is impossible; at best
you can reduce the probability of an adverse event. My point is that
if the probability of a collision is trillions and trillions of times
less likely than more mundane things such as hardware failure, then
you are much better off worrying about reducing that risk then
worrying about collisions. Worrying about collisions is analogous to
worrying about quantum mechanical uncertainty principle at the macro scale.
Spitting in the ocean is more effective than worrying about the
incremental adverse risk of collisions with a 192-bit hash on 256K blocks.

 > > It all depends on the probability... I would much prefer to take the
 > > risk of a mathematically known infinitesimal probability (of the order
 > > of md5 hash collisions) than what most people in life take for granted
 > > as "absolute" fact. At least with a mathematically modeled system you
 > > know the risk which is more than most of us know about most other
 > > elements of our systems.
 > > 
 > >  > I won't trust such a file system for backup data.
 > > 
 > > Making blanket statements like that show a lack of understanding of
 > > probability vs. certainty in the world. 
 > 
 > Well, I just said, *I* won't trust such a file system. It's just a
 > gut feeling. Something which isn't logical or anything.

OK - if you don't believe in logic, then I can't argue with you. You
might as well use Feng Shu to improve your data reliability.

 > 
 > > If for example, the probability of a collision is many orders of
 > > magnitude less than the probability of you losing all your backups
 > > then I wouldn't worry about it. It all depends on the probability...
 > 
 > The bad thing about probabilities is that they don't tell you anything
 > about what will happen, just about what might happen. Even if the
 > probability is very, very, very, very small, it doesn't mean it will
 > not instantly happen the next second. It's just very unlikely.
 > 

True - but that is fundamental to all life. The protons in your body
could all simultaneously decay the next second -- possible though a
bit unlikely ;)

As I proved in my earlier post, the chance of a collision on even a
Petabyte sized pool is about 1 in 10^38. 

Considering that the ocean has a volume of about 1.4 x 10^25 ml,
spitting in the ocean (assuming 10ml of spit) would have a 1.4 x 10^24
effect. So, the chance of a collision is in a loose sense 100 trillion
times less effective than spitting in the ocean.

For your own sake and certainly for the sake of those you work for,
please take a course in probability and absorb its meaning. You simply
cannot make good decisions in life in general or in protecting the
resources of your company if you cannot distinguish between 1 in 10^38
risks vs. 1 in 100 risks.

------------------------------------------------------------------------------
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

<Prev in Thread] Current Thread [Next in Thread>