ADSM-L

Re: [ADSM-L] DataDomain and dedup per node

2012-04-19 14:46:06
Subject: Re: [ADSM-L] DataDomain and dedup per node
From: "Huebner,Andy,FORT WORTH,IT" <Andy.Huebner AT ALCONLABS DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Thu, 19 Apr 2012 13:42:07 -0500
I am suspicious of dedup ratios in general.  What I found is that I can divide 
my data by 4 and be fairly accurate as to how much storage the DD will need.  
This formula has worked for 2 TSM (12-14:1) and 2 BE (20-25:1) sites, so I 
would not call it proven, expect in my little world.
BRMS seems to be different.

Andy Huebner

Perhaps this conversation should be at:
The Data Domain Admins List
http://lists.ufl.edu/cgi-bin/wa?A0=DD-ADMINS-L

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of 
Shawn Drew
Sent: Thursday, April 19, 2012 11:12 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: [ADSM-L] DataDomain and dedup per node

I was told the only reason EMC recommends to turn off collocation is that
collocation on shoots up the individual volume count-generally and they
also recommend a relatively high reclamation threshold.  I think these 2
factors together might end up in a lost of wasted unreclaimed space.  I
think it would be ok if you were more aggressive with your reclamation.
Something to keep an eye on at the least.

On another note, I've always been suspicious of  whether or not granular
analysis like this is accurate.   The deduplication of a single file would
vary depending on the other data that is on the system, which is
constantly changing.   If you delete all the other files that share data
with this one, will the deduplication factor of this file should shoot up?
 If so, than the deduplication ratio means nothing for a single file like
a compression ratio would.  I think it really only applies to the storage
pool as a whole.

Using collocation to identify "bad dedupe citizens" sounds reasonable, but
only if the values being returned by the "filesys show compression"
command is accurate.  Is that data dynamically updated?  Are the
individual file deduplication ratios immediately update automatically as
data is written or cleaned?  I remember Falconstor only recorded the
deduplication ratio of a virtual tape at the time the data was written and
was not updated.   I find it hard to believe this is dynamically
maintained by the data domain, but I'd definitely want to know before
switching to colocation for this purpose.

Deduplication adds an abstraction layer between the file metadata and the
actual storage.  I don't see how you could really get an accurate picture
of the true storage an individual file is occupying since it is sharing
space.  Say there are 10x 100MB files sharing 50 percent of their data
with each other.  How much space is one of those files occupying?



Regards,
Shawn
________________________________________________
Shawn Drew





Internet
rrhodes AT FIRSTENERGYCORP DOT COM

Sent by: ADSM-L AT VM.MARIST DOT EDU
04/19/2012 09:27 AM
Please respond to
ADSM-L AT VM.MARIST DOT EDU


To
ADSM-L
cc

Subject
[ADSM-L] DataDomain and dedup per node






Hi Everyone,

As we have been implementing our two new DD boxes we have been
setting them up like our existing two DD boxes - file devices
with the pool NOT collocated.  This is what DD recommends and
it seems to work very well this way.

But, I've been thinking about collocating anyway!

I was poking around the DD command line and found that you
can get the dedup/compression information for any individual
directory or file.  For example, below is the dedup/comp
factors for a file volume in a pool with one node I'm testing with:

  rsbkup:/tsmdata/tsm_scripts==>./run_cmd.ksh tsm2 "q nodedata WVLOGS01P"
| grep isdd2260
  WVLOGS01p    /isdd2260/tsm2/test/0002267E.BFS        TEST-PRI-ISDD2260
 30,551.83
  WVLOGS01P    /isdd2260/tsm2/test/0002267F.BFS        TEST-PRI-ISDD2260
 30,621.15
  WVLOGS01P    /isdd2260/tsm2/test/00022680.BFS        TEST-PRI-ISDD2260
 30,601.55
  WVLOGS01P    /isdd2260/tsm2/test/00022682.BFS        TEST-PRI-ISDD2260
 30,604.08
  WVLOGS01P    /isdd2260/tsm2/test/00022683.BFS        TEST-PRI-ISDD2260
 30,620.86
  WVLOGS01P    /isdd2260/tsm2/test/00022684.BFS        TEST-PRI-ISDD2260
 4,731.24

  rsbkup:/tsmdata/tsm_scripts==>./run_cmd.ksh tsm2 "q vol
/isdd2260/tsm2/test/0002267E.BFS"
  /isdd2260/tsm2/test/0002267E.BFS        TEST-PRI-ISDD2260       TEST
30.6 G  100.0   Full

  sysadmin@isdd2260# filesys show compression
/data/col1/tsm2/test/0002267e.bfs
  Total files: 1;  bytes/storage_used: 4.6
         Original Bytes:       32,332,636,620
    Globally Compressed:       30,695,597,675
     Locally Compressed:        6,930,888,022
              Meta-data:           98,615,480

In this case, this vol is getting a 4.6x overall dedup/comp factor.

So, if I collocate the pool in TSM I should be able to use "q nodedata
<node>" to get a list of vols used by a node, then I can query the DD to
get the dedup/comp stats for that node.  A little scripting and I can
generate a report of dedup/comp ratios by TSM node.  This would help us
maintain which nodes make sense to put/keep on the DD.

Just curious if anyone is using collocation for a DD file pool?  To do so
would use more volumes and more filling volumes, but I can't think of any
real reason to not collocate.

Rick




-----------------------------------------
The information contained in this message is intended only for the
personal and confidential use of the recipient(s) named above. If
the reader of this message is not the intended recipient or an
agent responsible for delivering it to the intended recipient, you
are hereby notified that you have received this document in error
and that any review, dissemination, distribution, or copying of
this message is strictly prohibited. If you have received this
communication in error, please notify us immediately, and delete
the original message.



This message and any attachments (the "message") is intended solely for
the addressees and is confidential. If you receive this message in error,
please delete it and immediately notify the sender. Any use not in accord
with its purpose, any dissemination or disclosure, either whole or partial,
is prohibited except formal approval. The internet can not guarantee the
integrity of this message. BNP PARIBAS (and its subsidiaries) shall (will)
not therefore be liable for the message if modified. Please note that certain
functions and services for BNP Paribas may be performed by BNP Paribas RCC, Inc.

This e-mail (including any attachments) is confidential and may be legally 
privileged. If you are not an intended recipient or an authorized 
representative of an intended recipient, you are prohibited from using, copying 
or distributing the information in this e-mail or its attachments. If you have 
received this e-mail in error, please notify the sender immediately by return 
e-mail and delete all copies of this message and any attachments.

Thank you.