• Please help support our sponsors by considering their products and services.
    Our sponsors enable us to serve you with this high-speed Internet connection and fast webservers you are currently using at ADSM.ORG.
    They support this free flow of information and knowledge exchange service at no cost to you.

    Please welcome our latest sponsor Tectrade . We can show our appreciation by learning more about Tectrade Solutions
  • Community Tip: Please Give Thanks to Those Sharing Their Knowledge.

    If you receive helpful answer on this forum, please show thanks to the poster by clicking "LIKE" link for the answer that you found helpful.

  • Community Tip: Forum Rules (PLEASE CLICK HERE TO READ BEFORE POSTING)

    Click the link above to access ADSM.ORG Acceptable Use Policy and forum rules which should be observed when using this website. Violators may be banned from this website. This notice will disappear after you have made at least 3 posts.

Compression with Data Deduplication

lightness

ADSM.ORG Member
Joined
Mar 7, 2006
Messages
13
Reaction score
0
Points
0
Hi All,

I'm exploring the benefits of compression and deduplication and was wondering if someone could assist in getting definite numbers. Running TSM version 7.1.5.0 and deduplication is enabled on the server. I backed up a drive of just over 2 GB and enabled client compression. Running a trace it shows me how much the data was compressed nicely (for this case around 50%) but how do I find out how much the data is deduplicated?? I know that enabling client compression with server deduplication is not the best way to do it, and I may realize less deduplication with the compression enabled, but is there a way to see the numbers on this???
 

RecoveryOne

ADSM.ORG Senior Member
Joined
Mar 15, 2017
Messages
296
Reaction score
61
Points
0
Are you running legacy pools or directory container pools?
If container pools, I'd recommend updating the server as high as you can go for a multitude of benefits.

If running legacy pools, you'll need to ensure the identify duplicate processes are running for your pools.
If you only backed up one server, one time to that one storage pool you won't yet see much in the way of deduplication benefits. Not until data starts to change, and more blocks line up with what you are looking for.

You could get a rough idea via something like this (copied from thobias' github page https://github.com/thobiast/tsm_sql:
SELECT occ.node_name, node.domain_name, node.platform_name, CAST(FLOAT(SUM(logical_mb)) / 1024 AS DEC(8,2)) as GB -
FROM occupancy occ, nodes node WHERE occ.node_name=node.node_name GROUP BY occ.node_name,node.domain_name,node.platform_name ORDER BY GB DESC

Or perhaps like this:
SELECT occ.node_name, node.domain_name, node.platform_name, CAST(FLOAT(SUM(logical_mb)) / 1024 AS DEC(8,2)) as GB -
FROM occupancy occ, nodes node WHERE occ.node_name=node.node_name GROUP BY occ.node_name,node.domain_name,node.platform_name ORDER BY GB DESC

Compare the values reported vs what the client did.

If you are using directory container pools, there's a handy generate dedupstats command: https://www.ibm.com/support/knowled.../srv.reference/r_cmd_dedupstats_generate.html
Note that it can run a really long time if you have a lot of data/clients.

I will say from my own experience that the legacy pools would get me at best a 2.8:1 reduction. While the directory container pools are achieving almost a 5:1 reduction.

And you may want to look at updating to the most recent 7 code, or even jump into v8 (again go latest). There's been a lot of improvements and issues fixed.
 

lightness

ADSM.ORG Member
Joined
Mar 7, 2006
Messages
13
Reaction score
0
Points
0
Are you running legacy pools or directory container pools?
If container pools, I'd recommend updating the server as high as you can go for a multitude of benefits.

If running legacy pools, you'll need to ensure the identify duplicate processes are running for your pools.
If you only backed up one server, one time to that one storage pool you won't yet see much in the way of deduplication benefits. Not until data starts to change, and more blocks line up with what you are looking for.

You could get a rough idea via something like this (copied from thobias' github page https://github.com/thobiast/tsm_sql:
SELECT occ.node_name, node.domain_name, node.platform_name, CAST(FLOAT(SUM(logical_mb)) / 1024 AS DEC(8,2)) as GB -
FROM occupancy occ, nodes node WHERE occ.node_name=node.node_name GROUP BY occ.node_name,node.domain_name,node.platform_name ORDER BY GB DESC

Or perhaps like this:
SELECT occ.node_name, node.domain_name, node.platform_name, CAST(FLOAT(SUM(logical_mb)) / 1024 AS DEC(8,2)) as GB -
FROM occupancy occ, nodes node WHERE occ.node_name=node.node_name GROUP BY occ.node_name,node.domain_name,node.platform_name ORDER BY GB DESC

Compare the values reported vs what the client did.

If you are using directory container pools, there's a handy generate dedupstats command: https://www.ibm.com/support/knowled.../srv.reference/r_cmd_dedupstats_generate.html
Note that it can run a really long time if you have a lot of data/clients.

I will say from my own experience that the legacy pools would get me at best a 2.8:1 reduction. While the directory container pools are achieving almost a 5:1 reduction.

And you may want to look at updating to the most recent 7 code, or even jump into v8 (again go latest). There's been a lot of improvements and issues fixed.
Thank you for that info. We are planning on going to the latest version of v8 in the very near future, and are currently running legacy pools.
 

RecoveryOne

ADSM.ORG Senior Member
Joined
Mar 15, 2017
Messages
296
Reaction score
61
Points
0
Ok. So with legacy pools, identify duplicates need to be running. I still have a mixture of old and new pools, so I find a good spot in my admin tasks to run those processes. Bit of a manual way I've done in the past is
Code:
select * from occupancy whre node_name='NODE_NAME'
For example on of my clients reports this:
LOGICAL_MB: 33944632.97
REPORTING_MB: 44502881.14
So you can look at what the client sent to the server, and then after identify duplicates ran you can then look at logical_mb and get a rough idea of what else was trimmed off. **EDIT: Logical_mb is also taking into account any compression done on the client as well, so it won't be 100% on. Just saying.

If you want to look at the whole storage pool a q stgpool POOL_NAME f=d will give you some info such as this:
Deduplication Savings: 10,947 G (17.00%)
Other's might have a better way to get information but it works well enough for my needs.
 

Advertise at ADSM.ORG

If you are reading this, so are your potential customer. Advertise at ADSM.ORG right now.

UpCloud high performance VPS at $5/month

Get started with $25 in credits on Cloud Servers. You must use link below to receive the credit. Use the promo to get upto 5 month of FREE Linux VPS.

The Spectrum Protect TLA (Three-Letter Acronym): ISP or something else?

  • Every product needs a TLA, Let's call it ISP (IBM Spectrum Protect).

    Votes: 18 18.4%
  • Keep using TSM for Spectrum Protect.

    Votes: 60 61.2%
  • Let's be formal and just say Spectrum Protect

    Votes: 12 12.2%
  • Other (please comement)

    Votes: 8 8.2%

Forum statistics

Threads
31,737
Messages
135,293
Members
21,736
Latest member
blietz2
Top