TSM 7.1.3 new directory based stgpools, in-line dedup

Mita201

ADSM.ORG Senior Member
Joined
Apr 20, 2006
Messages
600
Reaction score
30
Points
0
Location
Beograd, Serbia
PREDATAR Control23

I have started to play around with it, but I can't find any performance estimation info. I am especially interested how it cope with large files (eg. large database backups) since old post process dedup was choking on it, and it was not recommended to backup large files there at all. There is a parameter when defining stgpool of that kind, maxsize which makes decision if some file is going to be stored at all in that pool during backup, but there is no recommendation, and default value is no limit.

Also, according to knowledge center, it looks like HW requirements did not change from previous versions, and somewhere in my mind in-line dedup requires significant processing power....

Anyone has more info and willing to share?
 
PREDATAR Control23

Maybe some of what you're looking for is in the new blueprint documents: http://ibm.biz/IBMSpectrumProtectBlueprints

There was somewhat of a conversation about this over on LinkedIn on the TSM professionals group. Here's a snippet:

"Daily ingest is actually 100 TB/day for client side dedup based on a large file workload which in our testing was 80% VMware based backup using Spectrum Protect for Virtual Environments, and 20% Database and mixed Files.

Our new server side in-line dedup is hitting upwards of 80 TB/day for this same workload with the added benefit of no longer requiring reclamation processing or expensive deref processing for deleted chunks.

To be clear we have NOT changed the dedup algorithm itself so its compatible with pre 7.1.3 deduplicated data and existing client side dedup.

We have done a lot of work on the Spectrum Protect DB in this release to optimize it for dedup. I see some of the posts on database growth with "legacy" dedup and that is correct it can and will grow quickly. We addressed most of that in this release with a new table layout and our testing has shown that this new table schema is online reorg friendly and more importantly it is self-managing, meaning we are able to better reuse the space so no longer creating the growth. This is key to allowing a 7.1.3 server to protect up to 4 PB of managed data before dedup in a 1 PB dedup storage pool while keeping the Spectrum Protect DB at or under 4 TB. Another key part of the dedup changes is the new dedup optimized Container pool. You can think of it as the best of random access pools and file pools with extra benefits.

We will be rolling out an updated version of the Spectrum Protect Blueprints shortly. As I mentioned one of the value adds of the server side inline dedup is no longer requiring reclamation processing. This has allowed us to move off of the 15K RPM drives for the storage pool to 6 TB SATA drives without any impact on performance. You can read all about our 3 reference architectures in the Blueprints when they get published. We build, develop and test against the Blueprints so we are very confident that if folks follow the recipes they will get similar results."
 
PREDATAR Control23

rgg,

Thank you a lot for pointing me to new version of blueprint document. I will read it thoroughly.
As on first view, I noticed that it has been redefined what is "small", "medium" and "large" environment (all of them grew, compared to previous blueprint), which looks like now we need even less resources for same job, which is quite surprising, but certainly good. :)
At the moment, I am still uncertain about impact of storing large files into the directory container based pool (e.g. 300GB MS SQL database backup), and VMware backup is not good example since VM images has been backed up in 128MB megablocks - no really big files.
But, I will go a bit through blueprint, and check for answers.
Thank you,

Mita
 
PREDATAR Control23

So, my understanding is that with legacy dedup, these large files initially dedup'd "too well," which took a lot of time to do and of course resources to manage. Eventually a tier'ing system was introduced to make the common "chunks" bigger on larger files and help alleviate this issue. I assume something like this exists for container pools too.

My opinion is that you want to dedup these larger files, and that's likely where you're going to get the most space saving benefit. Hopefully it works well with container pools!
 
PREDATAR Control23

I can't download it from the site, what gives? I signed in and everything.

Found the problem, Chrome sucks on Windows 10.
 
Top