ADSM-L

Re: [ADSM-L] Dedupe

2009-06-25 10:30:19
Subject: Re: [ADSM-L] Dedupe
From: "John D. Schneider" <john.schneider AT COMPUTERCOACHINGCOMMUNITY DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Thu, 25 Jun 2009 07:25:53 -0700
Greetings,

Hopefully, this will not be too much traffic about the same topic. 
There are a zillion people jumping in to the dedupe market, because of
the huge opportunity to sell products in this space.  Not all products
are created equal.  Ask questions (or get references from existing
customers) to find out what ongoing support has been like, and what
problems or maintenance issues have arisen, and how the vendor handled
them.

In a normal tape environment, or virtual tape library, each backup you
do creates a separate copy of your data, at least the changed parts. 
For data that is changing often, you may have a dozen versions of that
data on different media.  And presumably, you are also creating a daily
offsite copy of that data.  In other words, you have redundant, multiple
copies of the data on separate media.  This is necessary, because no
media is perfect.

I a dedup appliance, that is exactly what you don't have.  The dedup
process guarantees that only one copy is kept of each unique block of
that data.  If a given block of data is lost due to corruption or
failure of the media, then potentially all of the copies of a certain
file that contains that block of data will be lost.  The people who are
designing these products, therefore, build their products to mitigate
this potential loss by:

- Striping data across multiple disks, multiple RAID sets, and sometimes
(as in the case of Avamar) even across multiple nodes in the grid.
- Building integrity checking into various layers of their protocol, so
that incoming data is proven clean as it is received.
- Systematic integrity checking of data as it resides on disk.  The
better designs do a full system scan and check of all data every 24
hours or so.
- Replication software that does integrity checking during the
replication, so any corruption won't get transferred to the remote copy.

These are the kinds of features that didn't exist during early versions
of dedupe products.  Any corruption due to a failure of the disk or
firmware in the array could be catastrophic.  But many dedupe products
today have a healthy paranoia about the reliability of hardware, and
protect themselves accordingly.  

So when evaluating dedupe products, be sure to ask questions about these
sorts of features.  Often times, the low end products don't.

Best Regards,

John D. Schneider
The Computer Coaching Community, LLC
Office: (314) 635-5424
Toll Free: (866) 796-9226
Cell: (314) 750-8721


-------- Original Message --------
Subject: Re: [ADSM-L] Dedupe
From: "Strand, Neil B." <NBStrand AT LMUS.LEGGMASON DOT COM>
Date: Thu, June 25, 2009 8:09 am
To: ADSM-L AT VM.MARIST DOT EDU

Ditto on Lindsay's "it depends"

For my NetApp devices, observed NAS filesystem dedupe renges from 10% to
70% depending on the data.
VMware NFS shares typically show a good ratio. We for our VM
environment, we split our OS apart from data and paging space as
depicted below:
Filesystem used saved
%saved
/vol/PROD_VM_OS/ 98314436 227793716 70%
/vol/PROD_VM_PAGING/ 3107084 1090756 26%
/vol/PROD_VM_DATA1/ 11253900 17343096 61%
/vol/DR_VM_OS1/ 105852808 236518940 69%
/vol/DR_VM_DATA1/ 431134632 216285060 33%
/vol/DR_VM_PAGING1/ 35520 4272 11%

The paging space is very dynamic and I don't expect much savings.
The OS space (where VM operating systems are installed) is relatively
static and redundant and reflects that with high dedup ratios.
The data space (where applications and everything else is) has a wide
variance - as expected.

But the end result is that I am saving disk space and actually improving
overall performance because redundant data has a higher probability of
residing in cache and the reference to a particular bit of redundant
data has a higher probability of residing in the cached lookup table.

If you are looking for dedupe on tape media, I don't think it is
feasable nor desired. Simple compression now allows me to put nearly
3TB on a single 3592 tape (again depending on the data). At a nominal
cost of $150/tape this results in about 5 cents/GB. Not too shabby. I
make a second offsite copy of the same data resulting in an overall cost
of 10 cents to provide +"five nines" probability that my company's data
is recoverable for the next 6 years. This is less than the cost of
electricity for disk based storage for the same time period.

Dedupe has it's place as do most technologies. It is not a golden egg
unless you force it to be ... and then, when it hatches, it may be a
fine goose or it may be a platypus - it depends on your environment.


Cheers,
Neil Strand
Storage Engineer - Legg Mason
Baltimore, MD.
(410) 580-7491
Whatever you can do or believe you can, begin it.
Boldness has genius, power and magic.


-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
Ochs, Duane
Sent: Thursday, June 25, 2009 7:35 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: [ADSM-L] Dedupe

For common practice de-dup is not a tape oriented process. It is usually
to reduce data on disks.
One concern would be the amount of tape mounts required to restore data
in the event of a DR scenario.
As the article has stated there are not many "global" de-dup products
yet. We have been able to implement some dedup on specific applications,
for instance E-mail attachments and it has worked out fairly well.
However, it primarily was to reduce the size of the Storage Groups of
our Exchange cluster, in the event of a DR scenario, which is on tier 1
storage. And the de-dupped attachments are now on tier 2. It reduced our
SGs by 1/3. The exchange SGs backups are retained based on legal
requirements and replicated. The attachments are not.

I also tested Data Domain and was very unimpressed by the numbers I saw.
It had very little impact on our largest amounts of data. Imaging,
Exchange and DB dumps. But that is also the hardest type of data to
de-dup.
My two cents.


-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
madunix
Sent: Wednesday, June 24, 2009 11:37 PM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: Dedupe

However, for my thoughts of Dedupe it could be interesting for those who
need to decrease the number of tape cartridges, but they could suffer
signifigannt CPU and I/O spec. for dedupe processing, and one issue i
was thinking about is a fauiler or if one part is corrupted, i.e. many
files would be affected by loss of common chunk, and what about
encryption is it compatible with encryption.

Thanks
madunix

>> -----Original Message-----
>> From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf

>> Of lindsay morris
>> Sent: Wednesday, June 24, 2009 1:07 PM
>> To: ADSM-L AT VM.MARIST DOT EDU
>> Subject: Re: [ADSM-L] Dedupe
>>
>> Short and clear answer about de-dupe:
>>
>> It depends.
>>
>> Hope this helps.
>>
>> ------
>> Mr. Lindsay Morris
>> Principal
>> www.tsmworks.com
>> 919-403-8260
>> lindsay AT tsmworks DOT com
>>

IMPORTANT: E-mail sent through the Internet is not secure. Legg Mason
therefore recommends that you do not send any confidential or sensitive
information to us via electronic mail, including social security
numbers, account numbers, or personal identification numbers. Delivery,
and or timely delivery of Internet mail is not guaranteed. Legg Mason
therefore recommends that you do not send time sensitive 
or action-oriented messages to us via electronic mail.

This message is intended for the addressee only and may contain
privileged or confidential information. Unless you are the intended
recipient, you may not use, copy or disclose to anyone any information
contained in this message. If you have received this message in error,
please notify the author by replying to this message and then kindly
delete the message. Thank you.

<Prev in Thread] Current Thread [Next in Thread>