ADSM-L

Re: [ADSM-L] Data Deduplication

2007-08-29 12:09:00
Subject: Re: [ADSM-L] Data Deduplication
From: David Longo <David.Longo AT HEALTH-FIRST DOT ORG>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Wed, 29 Aug 2007 11:45:51 -0400
I have been hearing bits and pieces about this de-dup thing.

Several things have me wondering , as folks on this list also
testify.

One thing I haven't heard about is performance. Even with TSM
clients,there is the thing not do "compression" on the client
due to performance issues.  That is just for individual files
or data streams.

As de-dup, from what I have read, compares across all files
on a "system" (server, disk storage or whatever), it seems
to me that this will be an enormous resource hog of CPU, memory
and disk I/O.  I am not just talking about using as some part of TSM
disk but say for instance on a File Server.

Any experiences/comments?

David Longo

>>> Paul Zarnowski <psz1 AT CORNELL DOT EDU> 8/27/2007 3:27 PM >>>
At 12:40 PM 8/27/2007, Curtis Preston wrote:
>Every block coming into the device should be compared to every other
>block ever seen by the device.


As others have noted, different vendors dedup at different levels of
granularity.  When I spoke to Diligent at the Gartner conference over
a year ago, they were very tight-lipped about their actual
algorithm.  The would, however, state that they were able to dedup
parts of two files that had similar data, but were not
identical.  I.e., if data was inserted at the beginning of the file,
some parts of the end of the file could still be deduped.  Neat trick
if it's true.  Other vendors dedup at the file or block (or chunk) level.

I've not been able to gather much more detail about the specific
dedup algorithms, but hope to get some more info this fall, as take a
closer look at these products.  If anyone has more details, I'd love
to hear them.

..Paul



--
Paul Zarnowski                            Ph: 607-255-4757
Manager, Storage Services                 Fx: 607-255-8521
719 Rhodes Hall, Ithaca, NY 14853-3801    Em: psz1 AT cornell DOT edu


#####################################
This message is for the named person's use only.  It may
contain confidential, proprietary, or legally privileged
information.  No confidentiality or privilege is waived or
lost by any mistransmission.  If you receive this message
in error, please immediately delete it and all copies of it
from your system, destroy any hard copies of it, and notify
the sender.  You must not, directly or indirectly, use,
disclose, distribute, print, or copy any part of this message
if you are not the intended recipient.  Health First reserves
the right to monitor all e-mail communications through its
networks.  Any views or opinions expressed in this message
are solely those of the individual sender, except (1) where
the message states such views or opinions are on behalf of
a particular entity;  and (2) the sender is authorized by
the entity to give such views or opinions.
#####################################

<Prev in Thread] Current Thread [Next in Thread>