Hmm ... Why would the management class have to change?
Because in my case, I don't want it to fill up my spinning disk pool that is the directory container. Can't migrate from the directory container pools to other storage pools unless its tier to cloud, or tier to tape. You should be able to define a new management class and have data write to a different storage pool if required.
**Edit: But having it broken up just makes my life and my operations team life easier. Sure I have a shortcut to call different dsm.opt files on the clients, but over all it works well.
Let me first make sure I'm clear on what you're not saying. You're *not* saying that TSM would take data on a standard (traditional) storage pool disk volume (No dedup) that was sent compressed by the client (no dedup) and then uncompress that when migrating (high water mark hit) to tape, right? I wouldn't imagine it would do that, in which case the compressed data will simply arrive at the tape drive, and the tape drive will try to compress it (it has not way of knowing that it's already compressed) and will either fail to compress it, in which case its written as is, or it might be able to squeeze more out of it since drive compression is more robust than the client side compression (lzw standard).
So are you referring instead to a storage pool wherein the data is deduped on the server side (or maybe client but regardless, it's stored deduped on disk on server), so then when it's later migrated to tape (high water mark), it will, of course, be rehydrated? I think rehydration always occurs when moving/copying data from disk, where it's been deduped, to tape, although there were a few companies that played around with preserving deduped data to tape, but I think it's generally never done? That right?
In a legacy pool (file or disk pool) if the client sends file1.zip its a compressed file. When that file is wrote to tape then yes the tape drive will, I'd assume, not perform any compression on it and write it as the bitstream it has to tape. What I'm saying is if you have compression and deduplication on client side, or running id duplicates on the storage pool, all those chunks it found and discarded would be re-assembled by the server before being wrote to tape. Also, it is my understanding the lz4 or lz0 compression used by the spectrum protect server would be reinflated, so the tape drive can then do its native compression. So, no the product will not inflate compressed objects that it didn't create.
Now, with the directory container pools, the data is wrote to tape in a different format. Guess in the container format? That is a compressed and deduplicated bitstream. So in this case, yes it is preserving dedup data to tape. This has a limitation of requiring you to restore the entire directory container pool to disk if you need to do client restores from this 'copy'. It's why IBM really recommends having a replication for these storage pools. As the restore from tape to disk could take a good while. Assuming you have the disk space to restore it to. All or nothing deal.
Yes, I recently added the 'compressalways no' option to my client user-options file, and now I see a lot of messages regarding files having increased in size (.gz in particular) and usually an accompanying line that the file Grew. I wasn't seeing those before. I plan to add some exclude.compression statements to force it not to even bother trying.
I have a few specific exclusions like .zip or tgz, gz etc, but not applied everywhere. For the most part I just let the client duke it out. Or if vendor A is using some other random compression algorithm, I just don't care enough to figure them all out
Also compressed and encrypted MSSQL flatfile backups are the worst. Uncompressed and non encrypted MSSQL flatfile backups and their cousins are all called... '*.bak'. Yeah, just let the client sort them out
And yes LTO6 drives. The non-dedup pool is using about 2.6 to 2.7 on a 2.5tb tape. Several tapes 100% full at 2.4tb. Yes, there is data in those filespaces that do compress. But on the whole, its not worth the cpu cycles to do that on the client or server side, and issues with time (see next paragraph below). My legacy dedup pool for copy volumes to tape are 2.8 or higher. I've never seen anything above 4.0tb. Sure, may not be much but when budgets are denied year after year for infrastructure upgrades, you work with what you got. And at the time, I was getting better performance/results not spending server cycles trying to rehydrate that data. Also writing to a disk pool and then sending that data to tape lets me better control tape resource utilization.
I'm not ashamed to say it, a lot of decisions as to why my environment is setup the way it is, is because of budgets and current infrastructure. We've tried to cram 100lb of sand in a 1 gallon bucket filled by a 1/4" pipe that was designed 8 years ago, and implemented fully 2 years after that. I spent months tuning AIX parameters for volumegroups, jfs2, hba settings trying to eek out every last bit of performance I could. It wasn't uncommon to see so much drive hitching that the write and read speed was 16Mb/sec. After tuning a lot of pramaters, seeing it go up to 20Mb/s was amazing. Pretty sure some of my old posts here reflect those speeds. No matter how many drives I would write to, I'd top out at a max of 256 or so Mb/sec (reported from fcstat/nmon). It would take 8+ hours to copy 1.4tb or so to tape. Since then, some improvements have been made as far as SAN connectivity, but very few 10g Ethernet links. What took 8 hours, is now being accomplished in 2 or there about. In the previous paragraph I mentioned tape resource utilization. That was a huge pain point. 1.4tb took 8 hours to write, database backup took 4 hours, other copies were being made for my other storage pools that could take 8+ hours.... I was cutting that 24 hour window very very tight and leaving little to no room for reclamation of tape volumes! Things are much better now. Its not uncommon to see my admin tasks start around 4am and be completed by 4pm. Which, in my mind having to previously struggle with the fabled 'Wheel of life' is amazing. After several years, only recently have I been able to push our SAN infrastructure to where our SAN team is starting to get worried that I'm using all their bandwidth
We run disk storage and tape resources over the same fabrics, so there could be times when performance isn't the best due to switches queuing up frames.
That said, I was an early adopter of directory container pools as soon as they added in the ability to protect them via tape resources. I needed the more compact compression features they provided to meet retention. I just do not at this time have enough disk storage to store everything so still leveraging tape resources. Also, our auditors like to know that tape is in use. They describe it as 'slow media, not affected by ransomware' vs online storage. So, if/when I get to a point of a replication environment, I will very likely have a tertiary copy on tape.
I don't know everything, I don't claim to know everything. I still ask the fine folks here for help or what am I missing type of questions. My setup is fairly limited in scope compared to others that use this product. And if I am wrong, call me out on it and help me improve! I've a few health checks with the IBM team, and I've had to explain some design decisions that were made and reasons why, and yes they go against best practices.
So there's a little back story as well. Not sure if all of that is useful, but hopefully gives you an idea of where I'm coming from.