TSM Deduplication Limitations ?

rowl

ADSM.ORG Senior Member
Joined
May 18, 2006
Messages
266
Reaction score
10
Points
0
Website
Visit site
I have heard some comments from local TSM folks that TSM 6 dedup is only usable on pools up to 5-6 TB in size. While I didn't get a lot of details, it sounded like they ended up cpu bound.

I am curious if anyone here has had positive (or negative) experiences with TSM deduplication and large storage pools. We are looking at the possibility of replacing deduplicating VTL's with large disk pools. It would be far less expensive, and less complicated than the VTL route if TSM deduplication is usable on a large scale.

To give you an idea of how large "large" is, we have some hosts that have occupancy numbers in TSM in the 80 - 100 TB range. Nearly 4 PB of total occupancy in our TSM backup environment. On average we move 60-80TB a day of backups to TSM.

The server platform we are considering is the Sun x4540. 12 cores, 32 GB RAM, 48 1.5 TB drives behind a zfs file system. With this platform each server adds more cpu, RAM, and capacity to the environment, so the hope is this will help to scale up the cpu\memory\storage bandwidth needed for deduplication.

Thanks,
-Rowl
 
I can't answer your question. But I'm surprised you would want to put all your eggs in one basket (one big TSM server).

We are not as big as you and we run windows. Our approach to move away from the VTL was to buy more less expensive servers and spread out the processing load, bandwidth, disaster exposure.

We have a couple SATA only TSM servers. They work great but when compared to the VTL there are additional exposures to consider.

The OS has access to all TSM data. A virus or some form of corruption could whack both storage pools. Adding many LUNs to a server makes managment of those LUNs very sensative. Your server and tsm admins best pay very good attention to detail. You should also think about the max number of LUNs your server will be happy with.

The VTLs were designed for a specific misison and they do it very well. When moving to self managed disk you'll need to design your own raid, spindal count, LUN size, ect... you might not see the gains you expect to see.
 
The OS has access to all TSM data. A virus or some form of corruption could whack both storage pools.
That was one of the deciding factors for management to go with AIX in my enviroment :p



Seriously, is anyone using TSM for deduping in large environments? I'm stuck with the same delima
 
That 6TB figure is how much backup data a TSM server could effectively dedupe per day. It's going to keep track of what data has been deduped in the storage pool and won't have to scan the entire storage pool every day. You just need to keep in mind that the processing of 6TB backup data could take up to 18 hours (depending on the number of CPUs / disk speed etc). So if your system isn't fast enough then you will never dedupe all the data coming in and could fall behind / use up too much storage space.

TSM dedupe is probably realistic if your TSM server backs up 2 - 4 TB per day. But then you also don't get dedupe across storage pools / TSM servers either whereas you would with one of the various dedupe appliances out there.
 
Thanks for clearing that up Canuck. So I guess that you could probably postpone the dedup process until the weekend. My servers are usually just running reclaimation during that off time.
 
Leaving it to the weekend could cost you quite a lot of storage though...With TSM dedupe you need that 'extra landing zone' of storage for the nightly backup, then it runs its dedupe process on the volumes (throwing out redundant chunks) and after dedupe and expiration TSM needs to run reclamation to recover the space. You don't actually lower your 'used' storage until it completely runs through the dedupe / reclamation processes.
 
I have used de-dup in my environment for 4-5 TB of data.
Till now I am fighting to get it work properly.

De-dup works fine but it makes expiration to hang. When this happens all other processes and backups also hang. Something to do with TSM Server resource conflict.

Opened a pmr and they are working on a diag server for us.

So if you plan to use this in your production, please test it in test environment first.
 
Thanks for all the feed back on this topic. To clarify one point I am not looking to create one big TSM server. This platform is far too small for that, even with deduplication. One of the ongoing struggles has been scaling up cpu/memory/capacity and I/O. The x4540 would be considered a building block, and we would roll out as many as needed to meet our capacity and throughput needs.

Using 4TB/day as a starting point, this sounds promising. On a per-TSM server basis that is about the average I am measuring. Tape still exists here, so that may well be used as an overflow pool for the data stored on these servers internal disk.
 
Back
Top