Veritas-bu

Re: [Veritas-bu] Tapeless backup environments?

2007-10-19 03:54:18
Subject: Re: [Veritas-bu] Tapeless backup environments?
From: "Curtis Preston" <cpreston AT glasshouse DOT com>
To: <bob944 AT attglobal DOT net>, <veritas-bu AT mailman.eng.auburn DOT edu>
Date: Fri, 19 Oct 2007 03:37:46 -0400
I wish we had a white board and could sit in front of each other to
finish the discussion, but it's obvious that it's not going to be
resolved here.  

You believe I'm missing your point, and I believe you're missing my
point.

>what matters is if you use a shorthand to track the
>values which can't tell that Feb 7 and Dec 28 are different values
>because you put them in the same hash bucket and therefore think that
>everything that bucket is Feb 7, you retrieve the wrong data.

Not sure how many times I (or others) have to keep saying, the dates are
not the data that are being deduped.  The dates are the hashes.  The
data is the person.

>An 8KB chunk of data can have 2^65536 possible values.  Representing
>that 8KB of data in 160 bits means that each of the 2^160 possible
>checksum/hash/fingerprint values MUST represent, on average, 2^65376
>*different* 8KB chunks of data.  

This, again, only makes sense if you are using the hash to
store/reconstruct the data, not to ID the data.  The fingerprint (like a
real fingerprint) is not used to reconstruct a block, it's only used to
give it a unique ID that distinguishes it from other blocks.  You still
have to store the block with the key.  And with 2^160 different
fingerprints, that means we can calculate unique fingerprints for 2^160
blocks.  That means we can calculate a unique fingerprint for
1,461,501,637,330,900,000,000,000,000,000,000,000,000,000,000,000
blocks, which is
11,832,317,255,831,000,000,000,000,000,000,000,000,000,000,000,000,000
bytes of data.  That's a lot of stinking data.

>If that doesn't concern you, well, it's safe to say I won't be hiring
>you as my backup admin.  Or as my technology consultant, since you

I really don't think you need to make it personal, and suggest that I
don't know what I'm doing simply because we have been unable to
successfully communicate to each other in this medium.  This medium can
be a very difficult one to communicate such a difficult subject in.  I
think things would be very different in person with a whiteboard.

>should know from earlier postings that spoofing your favorite 160-bit
>hashing algorithm with reasonable-looking fake data is now old hat.  
>The exploit itself should concern us, not to mention that it also
>illustrates that similar data which yields the same hash is not the
>once-in-the-lifetime-of-the-universe oddity you portray.

They worked really hard to figure out how to take one block that
calculates to a particular hash and create another block that calculates
to the same hash.  It's used to fake a signature.  I get it.  I just
don't see how or why somebody would use this to do I don't know what
with my backups.  And if we were having this discussion over a few
drinks we could try to come up with some ideas.  Right now, I'm as tired
as you are of this discussion.

>Everything mentioned here was covered in the original postings a month
>ago.  Unless there's something new, I'm done with this.

You're right.  IN THIS MEDIUM, you don't understand me, and I don't
understand you.  Let's agree to disagree and move on.

For anyone who's still reading, I just want to say this:

I was only trying to bring some sanity to what I felt was an undue
amount of FUD against the hash-only products. I'm not necessarily trying
to talk anyone into them.  I just want you to understand what I THINK
the real odds are.  If after understanding how it works and what the
odds are, you're still uncomfortable, don't dismiss dedupe.  Just
consider a non-hash-based de-dupe product.

Curtis out.

_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu