ADSM-L

Re: [ADSM-L] Data Deduplication

2007-08-29 14:11:57
Subject: Re: [ADSM-L] Data Deduplication
From: Curtis Preston <cpreston AT GLASSHOUSE DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Wed, 29 Aug 2007 14:09:20 -0400
De-dupe comes in two flavors:
1. Target de-dupe
2. Source de-dupe

Target de-dupe is de-dupe inside a VTL/IDT (intelligent disk target).
You send it regular TSM backups and it finds the duplicate data within
it.  A good vendor of this type should give you all the benefits of
de-dupe without any performance issues during backup or restore -- even
in a very large environment.

Source de-dupe is backup software written to de-dupe (e.g. EMC Avamar,
Symantec Puredisk, Asigra Televaulting) data before it ever leaves the
client.  This software definitely requires significant amounts of CPU on
the client, but the amount of bandwidth it saves is worth the trouble.
These products are therefore best for backing up remote office data, not
large datacenters.

---
W. Curtis Preston
Backup Blog @ www.backupcentral.com
VP Data Protection, GlassHouse Technologies 

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
David Longo
Sent: Wednesday, August 29, 2007 8:46 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: [ADSM-L] Data Deduplication

I have been hearing bits and pieces about this de-dup thing.

Several things have me wondering , as folks on this list also
testify.

One thing I haven't heard about is performance. Even with TSM
clients,there is the thing not do "compression" on the client
due to performance issues.  That is just for individual files
or data streams.

As de-dup, from what I have read, compares across all files
on a "system" (server, disk storage or whatever), it seems
to me that this will be an enormous resource hog of CPU, memory
and disk I/O.  I am not just talking about using as some part of TSM
disk but say for instance on a File Server.

Any experiences/comments?

David Longo

>>> Paul Zarnowski <psz1 AT CORNELL DOT EDU> 8/27/2007 3:27 PM >>>
At 12:40 PM 8/27/2007, Curtis Preston wrote:
>Every block coming into the device should be compared to every other
>block ever seen by the device.


As others have noted, different vendors dedup at different levels of
granularity.  When I spoke to Diligent at the Gartner conference over
a year ago, they were very tight-lipped about their actual
algorithm.  The would, however, state that they were able to dedup
parts of two files that had similar data, but were not
identical.  I.e., if data was inserted at the beginning of the file,
some parts of the end of the file could still be deduped.  Neat trick
if it's true.  Other vendors dedup at the file or block (or chunk)
level.

I've not been able to gather much more detail about the specific
dedup algorithms, but hope to get some more info this fall, as take a
closer look at these products.  If anyone has more details, I'd love
to hear them.

..Paul



--
Paul Zarnowski                            Ph: 607-255-4757
Manager, Storage Services                 Fx: 607-255-8521
719 Rhodes Hall, Ithaca, NY 14853-3801    Em: psz1 AT cornell DOT edu


#####################################
This message is for the named person's use only.  It may
contain confidential, proprietary, or legally privileged
information.  No confidentiality or privilege is waived or
lost by any mistransmission.  If you receive this message
in error, please immediately delete it and all copies of it
from your system, destroy any hard copies of it, and notify
the sender.  You must not, directly or indirectly, use,
disclose, distribute, print, or copy any part of this message
if you are not the intended recipient.  Health First reserves
the right to monitor all e-mail communications through its
networks.  Any views or opinions expressed in this message
are solely those of the individual sender, except (1) where
the message states such views or opinions are on behalf of
a particular entity;  and (2) the sender is authorized by
the entity to give such views or opinions.
#####################################

<Prev in Thread] Current Thread [Next in Thread>