ADSM-L

Re: [ADSM-L] Actual TSM client storage utilization using Data Domain

2012-09-22 10:04:34
Subject: Re: [ADSM-L] Actual TSM client storage utilization using Data Domain
From: Richard Rhodes <rrhodes AT FIRSTENERGYCORP DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Sat, 22 Sep 2012 10:00:41 -0400
Here is how I understand the output of "filesys show comp".

Origional Bytes:  THis is the size of the file as it come in the front door 
into the DD.  This should match the file size as the OS sees it.

Global Comp Bytes:  As the file comes into the DD, it is deduped.  This is the 
removal of duplicate pieces if info.  This is across all other files the DD is 
holding. So this number is the size of unique info the file contains.

Local COmp Bytes:  THe unique blocks are then compressed.  This is just like 
zipping a file, but in this case it's the set of unique blocks.  The result of 
this operation is what is then written to disk.

Now, the stats listed are AS THE FILE WAS WRITTEN.  
Lets take an example:  

  I write a 100mb file onto the DD and it got zero dedup and compression.
     OB = 100m
     GCB = 100m
     LCB = 100m (we wrote 100mb to disk)
     overall ratio for this file is ZERO (100OB/100LCB=0).
  These stats are SET for the life of that file - they won't change!
  
  Now let's change some stuff in the file, rename it, and write it onto the DD.
     OB = 100m
     GCB = 10m (let's assume it dedups out 90mb of the 100mb written)
     LCB = 1m  (let's assume it zips the 10mb down to 1mb. We write 1mb to disk)
     overall dedup ratio for this file is 100% (100OB/1LCB=100)
  THese stats are SET for the life of that file - they won't change!

  Now let's delete the first file.
  q) What is the dedup of the 2nd file now?
  a) IT DOESN'T CHANGE.  It still show as 100% dedup.

In other words, the dedup ration is NOT DYNAMIC.
You can see the overall status - that is, the total written size 
and the freespace of the DD, but individual stats can be misleading.

Rick




-----"ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU> wrote: -----
To: ADSM-L AT VM.MARIST DOT EDU
From: Shawn Drew 
Sent by: "ADSM: Dist Stor Manager" 
Date: 09/20/2012 06:30PM
Subject: Re: Actual TSM client storage utilization using Data Domain

I'm still not 100% on where that deduplicated data is accounted for in the
"global" statistic.  The way it is described is that it is the size of the
file after deduplication.
Does that mean it is the amount of unique data in that file?  If so, does
that mean the data that was not unique is accounted for in another file
that was, presumably, the first to add that data to the repository?



Regards,
Shawn
________________________________________________
Shawn Drew




Internet
rrhodes AT FIRSTENERGYCORP DOT COM

Sent by: ADSM-L AT VM.MARIST DOT EDU
09/20/2012 11:03 AM
Please respond to
ADSM-L AT VM.MARIST DOT EDU


To
ADSM-L
cc

Subject
Re: [ADSM-L] Actual TSM client storage utilization using Data Domain






I recently got a first cut at some scripts that gives us the DD dedup
stats per TSM node.  It's not pretty, but it does seem to work.  But, it
requires having the file pool be collocated.  That way each node uses
separate file volumes.

The logic goes like this:

- file pool on the DD MUST be collocated - each node has it's own vols
- for each node
  - get list of vols via "q nodedata <node>"
  - for each volume
    - on DD run "filesys show compression <vol_fiile_name>"
    - sum cmd output to for Origional Bytes, Global Comp Bytes, Local COmp
BYtes.
  - after all vols for node have been processed,
      compute overall comp ratio by
      (sum of Origional Bytes / sum Local Comp bytes)

So basically it's just get a list of vols per node and sum the results of
the "filesys show comp" commands.  The fun is translating the TSM vol name
into the DD internal path for the filesys cmd.

Here are a few lines from my report with (names changed to protect the
innocent).

  Origional Bytes = what comes into the DD
  Global Comp Bytes = size after dedup
  Local COmp Bytes = size after zip - this is what gets written to disk

tsm   node    #vols     SumOrigBytes     SumGlobalCompBytes
SumLocalCompBytes Ratio
----  -----   -------   ---------------- ------------------
----------------- -----
tsm7  node1   Vols= 1   OBmb= 24954.14   GCBmb= 3736.81     LCBmb= 1576.03
   CR= 15.83
tsm7  mode2   Vols= 1   OBmb= 20747.89   GCBmb= 2632.50     LCBmb= 1116.19
   CR= 18.58
tsm7  node3   Vols= 1   OBmb= 28528.93   GCBmb= 5200.65     LCBmb= 2609.92
   CR= 10.93
tsm1  node4   Vols= 9   OBmb= 250332.31  GCBmb= 7868.56     LCBmb= 4221.41
   CR= 59.30
tsm1  node5   Vols= 17  OBmb= 495973.34  GCBmb= 43150.37    LCBmb=
18792.24   CR= 26.39
tsm1  node6   Vols= 29  OBmb= 853369.75  GCBmb= 126286.69   LCBmb=
36064.45   CR= 23.66
tsm6  node7   Vols= 18  OBmb= 502341.18  GCBmb= 16647.87    LCBmb= 8263.54
   CR= 60.79
tsm2  node8   Vols= 2   OBmb= 43620.57   GCBmb= 11829.33    LCBmb= 2366.72
   CR= 18.43
tsm1  node9   Vols= 3   OBmb= 65267.99   GCBmb= 11109.95    LCBmb= 3286.02
   CR= 19.86
(and on and on)

The "filesys show comp" cmd gives the stats as when the file was WRITTEN
to the DD.  I suggest reading up on it and the quirks of what/how it
reports the comp info.

Anyway, that's what I did.

Rick






From:   Rick Adamson <RickAdamson AT WINN-DIXIE DOT COM>
To:     ADSM-L AT VM.MARIST DOT EDU
Date:   09/20/2012 10:08 AM
Subject:        Actual TSM client storage utilization using Data Domain
Sent by:        "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>



I have a situation where I suspect one, or more, of my TSM clients is
rapidly consuming large amount of storage space and am at odds of how to
accurately determine the culprit.

The TSM server is configured with a data domain dd880 as primary storage
of the device type "file" so obviously when I query the occupancy table in
TSM it provides raw numbers that do not reflect the de-dup and compression
of the DD device. Querying compression on the DD only provides numbers per
storage area, or context.

Has anyone been found a way to determine the actual amount of storage that
a particular client is using within data domain?

All comments welcome.....

TSM Server 5.5 on Windows and Data Domain ddos 5.1.

~Rick




-----------------------------------------
The information contained in this message is intended only for the
personal and confidential use of the recipient(s) named above. If
the reader of this message is not the intended recipient or an
agent responsible for delivering it to the intended recipient, you
are hereby notified that you have received this document in error
and that any review, dissemination, distribution, or copying of
this message is strictly prohibited. If you have received this
communication in error, please notify us immediately, and delete
the original message.



This message and any attachments (the "message") is intended solely for
the addressees and is confidential. If you receive this message in error,
please delete it and immediately notify the sender. Any use not in accord
with its purpose, any dissemination or disclosure, either whole or partial,
is prohibited except formal approval. The internet can not guarantee the
integrity of this message. BNP PARIBAS (and its subsidiaries) shall (will)
not therefore be liable for the message if modified. Please note that certain
functions and services for BNP Paribas may be performed by BNP Paribas RCC, Inc.