TSM Data Deduplication

pfsubaru

ADSM.ORG Member
Joined
Jan 24, 2009
Messages
154
Reaction score
0
Points
0
Hi,

does anyone heard of this, if so which version are they come with this feature.
 
If you are going to consider this, take a look at some of the webcasts and redbooks. In my opinion, it still has a lot to be desired: it is not "inline", or real-time, and eats up a lot of system resources. First, define your requirements, then evaluate it against separate appliances, such as datadomain.

Joining a Tivoli User Comminutity (TUC) will give you lots of resources: http://www.tivoli-ug.org
 
I tested it a bit, and I was very disappointed : the dedup ratio was around 2. A datadomain in similar cases dedups with a ratio of at least 4 ou 5.

In addition, Datadomain compresses with gzip or lzip the deduplicated data, going regularly to ratios around 10 for a file server.

And TSM is still unable to perform compression at the pool level. So you can achieve nearly as good ratio by just activating compression on the client.

I really wonder why IBM bothered to develop a deduplication algorithm just to do not better than a good old Gzip :down:
 
I read that TSM 6.2 will support client side deduplication.That may make this more interesting. I am really on the fence with inline vs post-process deduplication. I can't get the sort of performance some of my customers demand with inline deduplication.
 
I just read the pdf on DD, is this true ? you cannot DD on a DISK stg ?

Planning for deduplication
Careful planning for deduplication can increase the efficiency of the setup process.
Before setting up storage pools for deduplication:
v Determine which client nodes have data that you want to deduplicate.
v Decide whether you want to define a new storage pool exclusively for
deduplication or update an existing storage pool. The storage pool must be a
sequential-access disk (FILE) pool. Deduplication occurs at the storage pool
level, and all data within a storage pool, except encrypted data, is deduplicated
 
Not on a DISK device class, but a FILE device class, which is disk.

I've been working on a couple of projects with TSM deduplication. First off, it's post-processing, which I'm not a fan of. Second, it eats up a lot of resources; both disk and CPU. You will have to run your own tests to see if it's worth it or not.

For instance, on a smaller site, I ended up having to compress the backup data at the client site, reducing the data by over 75%. The added space savings from dedup is not worth the overhead, so I turned dedup off. At another installation I have loads of resources and dedup is running fine.

One caveat: the identification processes only FLAGS the deduplication blocks; you don't actually save any space until you run reclamation.
 
I just read the pdf on DD, is this true ? you cannot DD on a DISK stg ?
Yes, it's true because DD is very bad at managing direct access files. But you just have to use a FILE stg which works very well.
 
Back
Top