Planning for TSM deduplication

ToomasAas

Active Newcomer
Joined
Jul 21, 2010
Messages
32
Reaction score
0
Points
0
I'm running TSM 6.2 server. Daily backups are to disk storagepool, in addition we also run archive at the end of each month, preserving files for 190 days in the archive. Currently the archive goes directly to LTO2 tapes, but I'm thinking of using the disk also for archive. Much of the archived data is identical at the end of each month, so this seems like a good candidate for deduplication which I haven't used until now.

During reading the TSM 6.2 documentation about deduplication, I found this:

By default, primary sequential-access storage pools that are set up for data deduplication must be backed up to a copy storage pool before they can be reclaimed and duplicate data can be removed. To minimize the potential of data loss, do not change the default setting.


To protect the data in primary storage pools, issue the BACKUP STGPOOL command to copy the data to copy storage pools. Ensure that the copy storage pools are not configured for data deduplication. During storage pool backup to a non-deduplicated storage pool, server-side and client side extents are reassembled into contiguous files.

Erm... what? In order to have a deduplicated storage pool, I need to have another storage pool with the same data in non-deduplicated form? Surely I must be misunderstanding something, because this seems to defeat any possible space savings from deduplication. Can someone please explain this?
 
Erm... what? In order to have a deduplicated storage pool, I need to have another storage pool with the same data in non-deduplicated form? Surely I must be misunderstanding something, because this seems to defeat any possible space savings from deduplication. Can someone please explain this?

By default TSM will not deduplicate your data without it being in a copypool.
I like this features because;

1.you can disable it.
2.you might not want to deduplicate the data before sending it to tape to make this process faster since it would need to reduplicate the data before writing it to tape..and that is slower.

This is the setting you are looking for : http://publib.boulder.ibm.com/infoc...ref.doc/r_opt_server_deduprequiresbackup.html

Regards,
Stefan
 
Keep a few things in mind when you are thinking about using dedupe:

1 - restore performance takes a hit, the drop can be very significant and depends on the peformance of your database and the random i/o peformance of your filepool storage.
2 - your TSM db/active log/archive log/memory size requirements will increase significantly with dedup, say double what you need without dedup, i wouldn't think about using it with less than 32GB of memory.
3 - to make the restores fast you need a very very fast database and fast disks behind your filepool, 7.2k filepool disks will work but don't expect to get close to none-dedup restore speeds.
4 - you impact the whole server even if you dedup only a part of the data, due to the larger tsm database a db backup will take more time, when identify duplicates is running you have constant log and db i/o's that will impact other tsm proc's such a expire inventory etc.
5 - TSM db reorg become more important and take up more time and thus more i/o's

dedup in TSM is really great and it works well but realise what you are getting into, it does come with a cost.
 
Back
Top