Planning for TSM deduplication

ToomasAas · Feb 20, 2013

I'm running TSM 6.2 server. Daily backups are to disk storagepool, in addition we also run archive at the end of each month, preserving files for 190 days in the archive. Currently the archive goes directly to LTO2 tapes, but I'm thinking of using the disk also for archive. Much of the archived data is identical at the end of each month, so this seems like a good candidate for deduplication which I haven't used until now.

During reading the TSM 6.2 documentation about deduplication, I found this:

By default, primary sequential-access storage pools that are set up for data deduplication must be backed up to a copy storage pool before they can be reclaimed and duplicate data can be removed. To minimize the potential of data loss, do not change the default setting.

To protect the data in primary storage pools, issue the BACKUP STGPOOL command to copy the data to copy storage pools. Ensure that the copy storage pools are not configured for data deduplication. During storage pool backup to a non-deduplicated storage pool, server-side and client side extents are reassembled into contiguous files.

Erm... what? In order to have a deduplicated storage pool, I need to have another storage pool with the same data in non-deduplicated form? Surely I must be misunderstanding something, because this seems to defeat any possible space savings from deduplication. Can someone please explain this?

StefanF · Feb 22, 2013

ToomasAas said:
Erm... what? In order to have a deduplicated storage pool, I need to have another storage pool with the same data in non-deduplicated form? Surely I must be misunderstanding something, because this seems to defeat any possible space savings from deduplication. Can someone please explain this?

By default TSM will not deduplicate your data without it being in a copypool.
I like this features because;

1.you can disable it.
2.you might not want to deduplicate the data before sending it to tape to make this process faster since it would need to reduplicate the data before writing it to tape..and that is slower.

This is the setting you are looking for : http://publib.boulder.ibm.com/infoc...ref.doc/r_opt_server_deduprequiresbackup.html

Regards,
Stefan

StefanF · Feb 24, 2013

Keep a few things in mind when you are thinking about using dedupe:

1 - restore performance takes a hit, the drop can be very significant and depends on the peformance of your database and the random i/o peformance of your filepool storage.
2 - your TSM db/active log/archive log/memory size requirements will increase significantly with dedup, say double what you need without dedup, i wouldn't think about using it with less than 32GB of memory.
3 - to make the restores fast you need a very very fast database and fast disks behind your filepool, 7.2k filepool disks will work but don't expect to get close to none-dedup restore speeds.
4 - you impact the whole server even if you dedup only a part of the data, due to the larger tsm database a db backup will take more time, when identify duplicates is running you have constant log and db i/o's that will impact other tsm proc's such a expire inventory etc.
5 - TSM db reorg become more important and take up more time and thus more i/o's

dedup in TSM is really great and it works well but realise what you are getting into, it does come with a cost.

Planning for TSM deduplication

ToomasAas

Active Newcomer

StefanF

Newcomer

StefanF

Newcomer

Data Privacy Impact Assessment

Sponsor ADSM.ORG

Navigation Menu

NordVPN 3 Months FREE

Forum statistics