Data deduplication

finadmin · Aug 31, 2010

Hi

Im planning of implementing data deduplication on a TSM6.2 server that has currently 2 backuppools ; the diskpool and a LTO library. I would prefer the client side dedup since we currently are facing a client upgrade and must go thru them anyway (during which the dedup can be activated). Also client side dedup has the advantage of saving a littlebit of network bandwith during backup.

I read the specs and it came apparent that despite client side dedup, I would need a file device class backuppool to make the dedup happen.

I have a few questions and would much appreciate if someone knows the answer for these :

1. How much can I actually expect to save with the dedup (meaning how many percent on average it will reduce the total data amount stored to the tsm server).
2. I once ran the dedup on a test server and it increased the TSM database size considerably. What should I expect about the db size after implementation ?

BR
Mike

staham · Sep 1, 2010

1) We've saved 39% at the moment.

2) We've run deduplication for a couple of months, and the DB is 2.5x larger than before. It's 200 GB now, and was around 80 GB before. Don't know if that is the normal rate.

Jeff_Jeske · Sep 8, 2010

Staham..... did you run client side compression prior to trying dedup?

I ask because on average we are seeing a 30% reduction in storage simply by using client side compression without dedup. This doesn't add any overhead to the DB either.

I'm curious if I would see much value in running dedupe. I'm almost thinking I'd rather have a smaller more manageable database. How long does it take to backup a 200GB TSM DB?

erwanns · Sep 8, 2010

Hi,

I've read on an ATS presentation on dedup that it costs about 500 bytes in the database per data chunk (average size for data chunk is 256 Kbyte by default).

staham · Sep 9, 2010

Jeff_Jeske said:
Staham..... did you run client side compression prior to trying dedup?

I ask because on average we are seeing a 30% reduction in storage simply by using client side compression without dedup. This doesn't add any overhead to the DB either.

How long does it take to backup a 200GB TSM DB?

We've used client side compression on most (if not all) clients for a few years. I don't know the percentage of reduction this resulted in, but it also made a big reduction.

However, we still use client side compression, as well as deduplication. So, the 39% is the reduction of the already compressed data. So, in total, we might have reduced it to 50% or more, if I add the compression rate (which is not known to me). How did you get the 30% number? The only place I've seen compression rates is in the logs after a client is done with a backup, but I've seen no totals anywhere.

I have no statistics on how long a db backup takes, but checking today's backup, a full backup took 1 h 10 mins.

Data deduplication

finadmin

Active Newcomer

staham

Jeff_Jeske

erwanns

staham

Data Privacy Impact Assessment

Sponsor ADSM.ORG

Navigation Menu

NordVPN 3 Months FREE

Forum statistics