Re: [ADSM-L] DataDomain and dedup per node

2012-04-19 10:08:14
Subject: Re: [ADSM-L] DataDomain and dedup per node
From: "Schneider, Jim" <jschneider AT USSCO DOT COM>
Date: Thu, 19 Apr 2012 09:04:20 -0500
The most serious problem we have encountered is the effect of
reclamation on backup throughput.  We have a 1 GB backup network that is
used for servers to write directly to the Data Domain, bypassing TSM.
When only one client is writing directly to the DD we see backup network
utilization around 85%.  When reclamation is running the client gets 5%
backup network utilization.  I've cancelled reclamation and watched the
client throughput increase, then drop again when autoreclamation
restarts.  We now only run reclamation when the client's 1 TB daily
backup is complete (a 4- to 5-hour process).

Collocation will increase the number of files used to store data and to
be reclaimed.  If you can run reclamation when no other processing is
running there should be no impact, but watch your network stats for

Side Comment:  We run NDMP backups across fiber to a VTL on the Data
domain.  There is no effect on backup network utilization when the NDMP
backups are running.  I'm still puzzled by this.  Reclamation seems to
use enough DD resources to slow backup network data ingestion but NDMP
backups running with a higher throughput don't use enough DD processing
power to slow (or even effect) a direct write by a client over Ethernet.

Jim Schneider

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT vm.marist DOT edu] On Behalf Of
Richard Rhodes
Sent: Thursday, April 19, 2012 8:28 AM
To: ADSM-L AT vm.marist DOT edu
Subject: [ADSM-L] DataDomain and dedup per node

Hi Everyone,

As we have been implementing our two new DD boxes we have been setting
them up like our existing two DD boxes - file devices with the pool NOT
collocated.  This is what DD recommends and it seems to work very well
this way.

But, I've been thinking about collocating anyway!

I was poking around the DD command line and found that you can get the
dedup/compression information for any individual directory or file.  For
example, below is the dedup/comp factors for a file volume in a pool
with one node I'm testing with:

  rsbkup:/tsmdata/tsm_scripts==>./run_cmd.ksh tsm2 "q nodedata
| grep isdd2260
  WVLOGS01p    /isdd2260/tsm2/test/0002267E.BFS        TEST-PRI-ISDD2260
  WVLOGS01P    /isdd2260/tsm2/test/0002267F.BFS        TEST-PRI-ISDD2260
  WVLOGS01P    /isdd2260/tsm2/test/00022680.BFS        TEST-PRI-ISDD2260
  WVLOGS01P    /isdd2260/tsm2/test/00022682.BFS        TEST-PRI-ISDD2260
  WVLOGS01P    /isdd2260/tsm2/test/00022683.BFS        TEST-PRI-ISDD2260
  WVLOGS01P    /isdd2260/tsm2/test/00022684.BFS        TEST-PRI-ISDD2260

  rsbkup:/tsmdata/tsm_scripts==>./run_cmd.ksh tsm2 "q vol
  /isdd2260/tsm2/test/0002267E.BFS        TEST-PRI-ISDD2260       TEST
30.6 G  100.0   Full

  sysadmin@isdd2260# filesys show compression
  Total files: 1;  bytes/storage_used: 4.6
         Original Bytes:       32,332,636,620
    Globally Compressed:       30,695,597,675
     Locally Compressed:        6,930,888,022
              Meta-data:           98,615,480

In this case, this vol is getting a 4.6x overall dedup/comp factor.

So, if I collocate the pool in TSM I should be able to use "q nodedata
<node>" to get a list of vols used by a node, then I can query the DD to
get the dedup/comp stats for that node.  A little scripting and I can
generate a report of dedup/comp ratios by TSM node.  This would help us
maintain which nodes make sense to put/keep on the DD.

Just curious if anyone is using collocation for a DD file pool?  To do
so would use more volumes and more filling volumes, but I can't think of
any real reason to not collocate.


The information contained in this message is intended only for the
personal and confidential use of the recipient(s) named above. If the
reader of this message is not the intended recipient or an agent
responsible for delivering it to the intended recipient, you are hereby
notified that you have received this document in error and that any
review, dissemination, distribution, or copying of this message is
strictly prohibited. If you have received this communication in error,
please notify us immediately, and delete the original message.

Information contained in this e-mail message and in any attachments thereto is 
confidential. If you are not the intended recipient, please destroy this 
message, delete any copies held on your systems, notify the sender immediately, 
and refrain from using or disclosing all or any part of its content to any 
other person.