Re: [ADSM-L] Data Deduplication

2007-08-29 12:37:41
Subject: Re: [ADSM-L] Data Deduplication
From: Charles A Hart <charles_hart AT UHC DOT COM>
Date: Wed, 29 Aug 2007 11:35:48 -0500
The compression challenge is more related to creating unique backup
objects that can not be de-duped.  Compression does cause a CPU
performance hit on the client.  Performace related experience with
Dilkligent Protectier running on a Sunv40 with 4xDaul Core Procs and 32GB
memory we see a max of 250MBS writes per PT head, and up to 800MB reads.

As de-dup, from what I have read, compares across all files
on a "system" (server, disk storage or whatever), it seems
to me that this will be an enormous resource hog

The de-dup technology only compares / looks at the files with in its
specific repository.  Example: We have 8 Protectier node in one data
center which equtes to 8 Virtual Tape Libraries and 8   reposoitires.  The
data that gets compared is only with in 1 of the 8 repositires.  This is
why to get the best bang for the buck is to match up.  What we've been
trying to do is we'll have two          instances on a lpar one that backs
up Unix Prod the Other backs up Unix Non Prod, so we will register a Prod
DB on the Prod TSM Instance and the Dev DB CLient on the TSM Dev instance
then the two instances will share a Protectier Library, so in therory your
Prod and non-prod backups should factor very well.

Hopee this helps!


Charles Hart

David Longo <David.Longo AT HEALTH-FIRST DOT ORG>
Sent by: "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>
08/29/2007 10:45 AM
Please respond to


Re: [ADSM-L] Data Deduplication

I have been hearing bits and pieces about this de-dup thing.

Several things have me wondering , as folks on this list also

One thing I haven't heard about is performance. Even with TSM
clients,there is the thing not do "compression" on the client
due to performance issues.  That is just for individual files
or data streams.

As de-dup, from what I have read, compares across all files
on a "system" (server, disk storage or whatever), it seems
to me that this will be an enormous resource hog of CPU, memory
and disk I/O.  I am not just talking about using as some part of TSM
disk but say for instance on a File Server.

Any experiences/comments?

David Longo

>>> Paul Zarnowski <psz1 AT CORNELL DOT EDU> 8/27/2007 3:27 PM >>>
At 12:40 PM 8/27/2007, Curtis Preston wrote:
>Every block coming into the device should be compared to every other
>block ever seen by the device.

As others have noted, different vendors dedup at different levels of
granularity.  When I spoke to Diligent at the Gartner conference over
a year ago, they were very tight-lipped about their actual
algorithm.  The would, however, state that they were able to dedup
parts of two files that had similar data, but were not
identical.  I.e., if data was inserted at the beginning of the file,
some parts of the end of the file could still be deduped.  Neat trick
if it's true.  Other vendors dedup at the file or block (or chunk) level.

I've not been able to gather much more detail about the specific
dedup algorithms, but hope to get some more info this fall, as take a
closer look at these products.  If anyone has more details, I'd love
to hear them.


Paul Zarnowski                            Ph: 607-255-4757
Manager, Storage Services                 Fx: 607-255-8521
719 Rhodes Hall, Ithaca, NY 14853-3801    Em: psz1 AT cornell DOT edu

This message is for the named person's use only.  It may
contain confidential, proprietary, or legally privileged
information.  No confidentiality or privilege is waived or
lost by any mistransmission.  If you receive this message
in error, please immediately delete it and all copies of it
from your system, destroy any hard copies of it, and notify
the sender.  You must not, directly or indirectly, use,
disclose, distribute, print, or copy any part of this message
if you are not the intended recipient.  Health First reserves
the right to monitor all e-mail communications through its
networks.  Any views or opinions expressed in this message
are solely those of the individual sender, except (1) where
the message states such views or opinions are on behalf of
a particular entity;  and (2) the sender is authorized by
the entity to give such views or opinions.

This e-mail, including attachments, may include confidential and/or
proprietary information, and may be used only by the person or entity to
which it is addressed. If the reader of this e-mail is not the intended
recipient or his or her authorized agent, the reader is hereby notified
that any dissemination, distribution or copying of this e-mail is
prohibited. If you have received this e-mail in error, please notify the
sender by replying to this message and delete this e-mail immediately.

<Prev in Thread] Current Thread [Next in Thread>