Veritas-bu

Re: [Veritas-bu] Tapeless backup environments?

2007-09-24 19:05:11
Subject: Re: [Veritas-bu] Tapeless backup environments?
From: "Curtis Preston" <cpreston AT glasshouse DOT com>
To: <veritas-bu AT mailman.eng.auburn DOT edu>
Date: Mon, 24 Sep 2007 18:41:45 -0400
There are no products in the market that rely solely on a checksum to
identify redundant data.  There are a few that rely solely on a 160-bit
hash, which is significantly larger than a checksum (typically 12-16
bits).  There are some who are concerned about hash collisions in this
scenario.  I am not one of those people.  Here is a quote from an
article I wrote.  The entire article is available here:

http://tinyurl.com/2j7r52

<quote>
Hash collisions occur when two different chunks produce the same hash.
It's widely acknowledged in cryptographic circles that a determined
hacker could create two blocks of data that would have the same MD5
hash. If a hacker could do that, they might be able to create a fake
cryptographic signature. That's why many security experts are turning to
SHA-1. Its bigger key space makes it much more difficult for a hacker to
crack. However, at least one group has already been credited with
creating a hash collision with SHA-1.

The ability to forcibly create a hash collision means absolutely nothing
in the context of deduplication. What matters is the chance that two
random chunks would have a hash collision. With a 128-bit and 160-bit
key space, the odds of that happening are 1 in 2128 with MD5, and 1 in
2160 with SHA-1. That's 1038 and 1048, respectively. If you assume that
there's less than a yottabyte (1 billion petabytes) of data on the
planet Earth, then the odds of a hash collision with two random chunks
are roughly 1,461,501,637,330,900,000,000,000,000 times greater than the
number of bytes in the known computing universe.

Let's compare those odds with the odds of an unrecoverable read error on
a typical disk--approximately 1 in 100 trillion or 1014. Even worse odds
are data miscorrection, where error-correcting codes step in and believe
they have corrected an error, but miscorrect it instead. Those odds are
approximately 1 in 1021. So you have a 1 in 1021 chance of writing data
to disk, having the data written incorrectly and not even knowing it.
Everybody's OK with these numbers, so there's little reason to worry
about the 1 in 1048 chance of a SHA-1 hash collision.

If you want to talk about the odds of something bad happening and not
knowing it, keep using tape. Everyone who has worked with tape for any
length of time has experienced a tape drive writing something that it
then couldn't read. Compare that to successful deduplication disk
restores. According to Avamar Technologies Inc. (recently acquired by
EMC Corp.), none of its customers has ever had a failed restore. Hash
collisions are a nonissue.
</quote>

---
W. Curtis Preston
Backup Blog @ www.backupcentral.com
VP Data Protection, GlassHouse Technologies 

-----Original Message-----
From: veritas-bu-bounces AT mailman.eng.auburn DOT edu
[mailto:veritas-bu-bounces AT mailman.eng.auburn DOT edu] On Behalf Of A Darren
Dunham
Sent: Monday, September 24, 2007 5:59 PM
To: Veritas-bu AT mailman.eng.auburn DOT edu
Subject: Re: [Veritas-bu] Tapeless backup environments?

On Mon, Sep 24, 2007 at 05:08:31PM -0400, bob944 wrote:
> In the technologies I'm familiar with--one of them is old, another
new,
> it's conceptually simple.  "The system," whether that's a standalone
> system or a box of disk with some smarts or an agent on the backup
> client, receives data and examines it in blocks of some size (AFAIK,
> always way larger than a 512-byte disk block).  Simplistically, it
> checksums the "block" and looks in a table of
> checksums-of-"blocks"-that-it-already-stores to see if the identical
> <ahem, anyone see a hole here?> data already lives there.

Yes, there's a hole there if that's all you're relying on.  Not all of
them do that.

-- 
Darren Dunham                                           ddunham AT taos DOT com
Senior Technical Consultant         TAOS            http://www.taos.com/
Got some Dr Pepper?                           San Francisco, CA bay area
         < This line left intentionally blank to confuse you. >
_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu