ADSM-L

Re: [ADSM-L] DataDomain and dedup per node

2012-04-19 12:15:26
Subject: Re: [ADSM-L] DataDomain and dedup per node
From: Shawn Drew <shawn.drew AT AMERICAS.BNPPARIBAS DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Thu, 19 Apr 2012 12:11:47 -0400
I was told the only reason EMC recommends to turn off collocation is that
collocation on shoots up the individual volume count-generally and they
also recommend a relatively high reclamation threshold.  I think these 2
factors together might end up in a lost of wasted unreclaimed space.  I
think it would be ok if you were more aggressive with your reclamation.
Something to keep an eye on at the least.

On another note, I've always been suspicious of  whether or not granular
analysis like this is accurate.   The deduplication of a single file would
vary depending on the other data that is on the system, which is
constantly changing.   If you delete all the other files that share data
with this one, will the deduplication factor of this file should shoot up?
 If so, than the deduplication ratio means nothing for a single file like
a compression ratio would.  I think it really only applies to the storage
pool as a whole.

Using collocation to identify "bad dedupe citizens" sounds reasonable, but
only if the values being returned by the "filesys show compression"
command is accurate.  Is that data dynamically updated?  Are the
individual file deduplication ratios immediately update automatically as
data is written or cleaned?  I remember Falconstor only recorded the
deduplication ratio of a virtual tape at the time the data was written and
was not updated.   I find it hard to believe this is dynamically
maintained by the data domain, but I'd definitely want to know before
switching to colocation for this purpose.

Deduplication adds an abstraction layer between the file metadata and the
actual storage.  I don't see how you could really get an accurate picture
of the true storage an individual file is occupying since it is sharing
space.  Say there are 10x 100MB files sharing 50 percent of their data
with each other.  How much space is one of those files occupying?



Regards,
Shawn
________________________________________________
Shawn Drew





Internet
rrhodes AT FIRSTENERGYCORP DOT COM

Sent by: ADSM-L AT VM.MARIST DOT EDU
04/19/2012 09:27 AM
Please respond to
ADSM-L AT VM.MARIST DOT EDU


To
ADSM-L
cc

Subject
[ADSM-L] DataDomain and dedup per node






Hi Everyone,

As we have been implementing our two new DD boxes we have been
setting them up like our existing two DD boxes - file devices
with the pool NOT collocated.  This is what DD recommends and
it seems to work very well this way.

But, I've been thinking about collocating anyway!

I was poking around the DD command line and found that you
can get the dedup/compression information for any individual
directory or file.  For example, below is the dedup/comp
factors for a file volume in a pool with one node I'm testing with:

  rsbkup:/tsmdata/tsm_scripts==>./run_cmd.ksh tsm2 "q nodedata WVLOGS01P"
| grep isdd2260
  WVLOGS01p    /isdd2260/tsm2/test/0002267E.BFS        TEST-PRI-ISDD2260
 30,551.83
  WVLOGS01P    /isdd2260/tsm2/test/0002267F.BFS        TEST-PRI-ISDD2260
 30,621.15
  WVLOGS01P    /isdd2260/tsm2/test/00022680.BFS        TEST-PRI-ISDD2260
 30,601.55
  WVLOGS01P    /isdd2260/tsm2/test/00022682.BFS        TEST-PRI-ISDD2260
 30,604.08
  WVLOGS01P    /isdd2260/tsm2/test/00022683.BFS        TEST-PRI-ISDD2260
 30,620.86
  WVLOGS01P    /isdd2260/tsm2/test/00022684.BFS        TEST-PRI-ISDD2260
 4,731.24

  rsbkup:/tsmdata/tsm_scripts==>./run_cmd.ksh tsm2 "q vol
/isdd2260/tsm2/test/0002267E.BFS"
  /isdd2260/tsm2/test/0002267E.BFS        TEST-PRI-ISDD2260       TEST
30.6 G  100.0   Full

  sysadmin@isdd2260# filesys show compression
/data/col1/tsm2/test/0002267e.bfs
  Total files: 1;  bytes/storage_used: 4.6
         Original Bytes:       32,332,636,620
    Globally Compressed:       30,695,597,675
     Locally Compressed:        6,930,888,022
              Meta-data:           98,615,480

In this case, this vol is getting a 4.6x overall dedup/comp factor.

So, if I collocate the pool in TSM I should be able to use "q nodedata
<node>" to get a list of vols used by a node, then I can query the DD to
get the dedup/comp stats for that node.  A little scripting and I can
generate a report of dedup/comp ratios by TSM node.  This would help us
maintain which nodes make sense to put/keep on the DD.

Just curious if anyone is using collocation for a DD file pool?  To do so
would use more volumes and more filling volumes, but I can't think of any
real reason to not collocate.

Rick




-----------------------------------------
The information contained in this message is intended only for the
personal and confidential use of the recipient(s) named above. If
the reader of this message is not the intended recipient or an
agent responsible for delivering it to the intended recipient, you
are hereby notified that you have received this document in error
and that any review, dissemination, distribution, or copying of
this message is strictly prohibited. If you have received this
communication in error, please notify us immediately, and delete
the original message.



This message and any attachments (the "message") is intended solely for
the addressees and is confidential. If you receive this message in error,
please delete it and immediately notify the sender. Any use not in accord
with its purpose, any dissemination or disclosure, either whole or partial,
is prohibited except formal approval. The internet can not guarantee the
integrity of this message. BNP PARIBAS (and its subsidiaries) shall (will)
not therefore be liable for the message if modified. Please note that certain
functions and services for BNP Paribas may be performed by BNP Paribas RCC, Inc.