Re: [ADSM-L] Data Deduplication

2007-08-27 15:29:29
Subject: Re: [ADSM-L] Data Deduplication
From: Paul Zarnowski <psz1 AT CORNELL DOT EDU>
Date: Mon, 27 Aug 2007 15:27:17 -0400
At 12:40 PM 8/27/2007, Curtis Preston wrote:
Every block coming into the device should be compared to every other
block ever seen by the device.

As others have noted, different vendors dedup at different levels of
granularity.  When I spoke to Diligent at the Gartner conference over
a year ago, they were very tight-lipped about their actual
algorithm.  The would, however, state that they were able to dedup
parts of two files that had similar data, but were not
identical.  I.e., if data was inserted at the beginning of the file,
some parts of the end of the file could still be deduped.  Neat trick
if it's true.  Other vendors dedup at the file or block (or chunk) level.

I've not been able to gather much more detail about the specific
dedup algorithms, but hope to get some more info this fall, as take a
closer look at these products.  If anyone has more details, I'd love
to hear them.


Paul Zarnowski
Manager, Storage Services                 Fx: 607-255-8521
719 Rhodes Hall, Ithaca, NY 14853-3801    Em: psz1 AT cornell DOT edu

