Veritas-bu

Re: [Veritas-bu] Tapeless backup environments?

2007-10-18 14:05:10
Subject: Re: [Veritas-bu] Tapeless backup environments?
From: "Curtis Preston" <cpreston AT glasshouse DOT com>
To: "Iverson, Jerald" <Jerald.Iverson AT aiminvestments DOT com>, <veritas-bu AT mailman.eng.auburn DOT edu>
Date: Thu, 18 Oct 2007 13:44:03 -0400
So you're OK with hash-based de-dupe, which everyone acknowledges has a
chance (although quite small) that you could have a hash-collision and
potentially corrupt a block of data somewhere, sometime, when you least
expect it...

But you're NOT ok with the long-running industry standard of loss-less
compression algorithms?  (All compression algorithms for tape are
loss-less algorithms.)  Lossy algorithms are only used in things like
video compression, where it's ok to lose blocks along the way as long as
the human eye can't detect them, or as long as you can fit it on
youtube.

---
W. Curtis Preston
Backup Blog @ www.backupcentral.com
VP Data Protection, GlassHouse Technologies 

-----Original Message-----
From: veritas-bu-bounces AT mailman.eng.auburn DOT edu
[mailto:veritas-bu-bounces AT mailman.eng.auburn DOT edu] On Behalf Of Iverson,
Jerald
Sent: Thursday, October 18, 2007 8:52 AM
To: veritas-bu AT mailman.eng.auburn DOT edu
Subject: Re: [Veritas-bu] Tapeless backup environments?


> What you must grasp is that it is *impossible* to
> represent/re-create/look up the values of 2^65536 bits in fewer than
> 2^65536 bits--unless you concede that each checksum/hash/fingerprint
> will represent many different values of the original data--any more
than
> you can represent three bits of data with two.

that is why i have turned off all hardware and software compression on
my tape drives.  imagine trying to store more than 400GB of data onto a
single lto3 tape!  they "say" that you can store up to and even more
than 800GB, but i don't believe a word of it.  there is no way 1 nibble
of data can represent 1 byte!  once i have the time to study lzr
compression and understand it, and see whether or not it is
"data-loss-less", then i may turn compression back on.  until then,
tapes are cheap and i'll buy 2.5 times as many as i need.  :-)

thanks,
jerald

p.s.
our de-dupe vtl does the hash and then a bit by bit comparison of the
data block to ensure the data really is the same in order to eliminate
the duplicate block.  i think some of the confusion may be in not
understanding how the de-dupe process works.  once you create a hash for
a block of data, you are storing the hash AND the block of data.  you
are never having to re-create a big block a data from a smaller hash.
the backup stream of data gets re-written from a "string" of 8k blocks,
into a "string" of 160-bit pointers which point to the unique 8k blocks
of data via the hash table.  or something like that...
****************************************************************
Confidentiality Note: The information contained in this
message, and any attachments, may contain confidential
and/or privileged material.  It is intended solely for the
person(s) or entity to which it is addressed.  Any review,
retransmission, dissemination, or taking of any action in
reliance upon this information by persons or entities other
than the intended recipient(s) is prohibited.  If you received
this in error, please contact the sender and delete the
material from any computer.
****************************************************************

_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu