Veritas-bu

Re: [Veritas-bu] Tapeless backup environments?

2007-10-01 01:48:33
Subject: Re: [Veritas-bu] Tapeless backup environments?
From: "Curtis Preston" <cpreston AT glasshouse DOT com>
To: <bob944 AT attglobal DOT net>, <veritas-bu AT mailman.eng.auburn DOT edu>
Date: Mon, 1 Oct 2007 01:35:01 -0400
Bob,

I'll try to respond as best as I can.

>No importa.  The length of the checksum/hash/fingerprint and the
>sophistication of its algorithm only affect how frequently--not
>whether--the incorrect answer is generated.

You and I don't disagree on this.  The only thing we differ with is the
odds of the event.  I think the odds are small enough to not be
concerned with, and you think they're larger than that.

(I also think it's important to state what I stated in my other reply.
Most de-dupe systems do not rely only on hashes.  So if you can't get
past this whole hashing thing, there's no reason to reject de-dupe
altogether.  Just make sure your vendor uses an alternate method.

>The notion that the bad guys will never figure out a way to plant a
>silent data-change based on checksum/hash/fingerprint collisions is,
>IMO, naive.

So someone is going to exploit the hash collision possibilities in my
backup system to do what, exactly?  As much as I've spoken and written
about storage security, I can't for the life of me figure out what
someone would hope to gain or how they would gain it this way.

>Those are impressive, and dare I guess, vendor-supplied, numbers.  And
>they're meaningless.  

These are odds based on the size of the key space.  If you have 2^160
odds, you have a 1:2^160 chance of a collision.

>What _is_ important?  To me, it's important that if I read
>back any of the N terrabytes of data I might store this week, I get the
>same data that was written, not a silently changed version because the
>checksum/hash/fingerprint of one block that I wrote collides with
>another cheksum/hash/fingerprint.  

This is referring to the birthday paradox.  As I stated in another post,
I haven't thought about this before, and am looking into what the real
odds are.  I'm trying to translate it into actual numbers.

>I can NOT have that happen to any
>block--in a file clerk's .pst, a directory inode or the finance
>database.  "Probably, it won't happen" is not acceptable.

Couldn't agree more.

>> Let's compare those odds with the odds of an unrecoverable 
>> read error on a typical disk--approximately 1 in 100 trillion

>Bogus comparison.  In this straw man, that 1/100,000,000,000,000 read
>error a) probably doesn't affect anything

I thought probably wasn't acceptable?  I'm sorry, that was just too
close to your previous use of "probably" in a very different context.

>probably doesn't affect anything because of the higher-level
>RAID array it's in and b) if it does, there's an error, a
>we-could-not-read-this-data, you-can't-proceed, stop, fail,
>get-it-from-another-source error--NOT a silent changing of the data
from
>foo to bar on every read with no indication that it isn't the data that
>was written.

I think Darren's other posts about this point are sufficient.  It
happens.  It happens all the time, and is well documented.  And yet the
industry's ok with this.  On the other hand, the odds of what we're
talking about are significantly smaller and people are freaking out.

>> If you want to talk about the odds of something bad happening and not
>> knowing it, keep using tape. Everyone who has worked with tape for
any
>> length of time has experienced a tape drive writing something that it
>> then couldn't read.
>
>That's not news, and why we've been making copies of data for, oh, 50
>years or so.

I'm just saying that a hash collision, however possible, would basically
translate into a failed backup that looks good.  Do you have any idea
how many failed backups that look good happen every single day with
tape?  And, as long as you bring up making copies, making copies of your
de-duped data removes any concerns, as it verifies the original.

>> Compare that to successful deduplication disk
>> restores. According to Avamar Technologies Inc. (recently acquired by
>> EMC Corp.), none of its customers has ever had a failed restore.
>
>Now _there's_ an unbiased source.

Touche'. Anyone who has actually experienced a hash collision in their
de-duplication backup system please stand up.  Given the hype that
de-dupe has made, don't you think that anyone who had experienced such a
thing would have reported it and such a report would have been given big
press?  I sure do.  And yet there has been nothing.


_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu