TSM and IBM Protectier TS7650 compression discussion

jharris

ADSM.ORG Member
Joined
May 24, 2004
Messages
166
Reaction score
0
Points
0
Location
Victoria, Australia
Website
Visit site
Hi guys,

For those of you with TSM and Protectier deployment experience, rather than a de-dupe question, I have a compression question.

First of all, we have had the TS7650 running on a TSM 5.5 3-way clustered environment with 7 TSM servers and a TSM library manager for about a month. We've tested direct writing to tape, including lan-free and also the migration of quite a bit of our 3592 media, all without trouble.

Now, the IBM Protectier (Previous Diligent Product) recommends, no client compression / encryption when storing TSM data into the VTL as this maximises the amount of de-duplication possible via the appliance.

I've enabled compression on the the single virtual tape library I've created via the PTManager software to further minimise the back-end disk usage for virtual tape cartridges. As the Protectier is currently emulating LTO3 tape drives, I've defined our TSM device classes with type 'Ultrium3'.

My questions are:
1. Traditionally, although real tape drives are compression capable, the data was not actually compressed on the tape unless the device class was defined as 'Ultrium3c' within TSM. So even though I've enabled compression on the Protectier Library through the PTManager software, will the uncoompressed client data be getting maximum compression on the virtual LTO3 drives, if I have defined my device classes non-compressed?
The Protectier redbooks don't specifically talk about TSM at this level and just states to enable Protectier compression just turn it on for the virtual library created.

2. Also, I can see the current level of deduplication (hyperfactor ration) but where do I see my currently levels of compression being achieved?
 
It is my understanding that.....

1) As long as you don't compress data at the client end you are giving the PT box a chance to dedup the data. If you do compress the data it is likely that you won't see any if little deduplication being done.
2) PT will just receive a data stream and try and dedup it and then compress it, regardless of the TSM parms you set. I have seen compressed data being sent to a PT box and it isn't a pretty site!
3) From PT manager I don't think you can see compression ratio's although they are extractabale from the internal PT logs
 
I had a response from IBM, stating that the hyperfactor ratio shown via the PT Manager summary screen is the result of deduplication and compression by the appliance. They still could not directly answer my question above, regarding device class definitions.

If you think about it, when using a type 'Ultirum3C' for a real LTO3 drive, the data is still not compressed my TSM before it gets to the tape drive. It just tells the tape code to allow the data to be compressed by the tape hardware when it gets it. Which should be the same logic applied to a virtual LTO drive. That is, enable compression to be used on the virtual drives using the PT Manager software, but actual compression may or may not be applied by each virtual drive depending on whether 'Ultriium3' or 'Ultrium3c' type is used on the device class.

Anyway, I will assume at this stage that the TSM device class definition has nothing to do with improving the hyperfactor ratio.
 
Anyway, I will assume at this stage that the TSM device class definition has nothing to do with improving the hyperfactor ratio.

Correct and anything you specify won't improve the performance of the PT factoring, with the exception of turning compression/encryption off. Typical actions for VTL hardware is just to acknowledge the request so that the calling software get's an appropriate response and thinks that a real tape drive is out there.

You can assume that all data sent to PT will be first de-dupdup'd and then compressed regardless of what commands you send from the software mounting the virtual drive.
 
If you think about it, when using a type 'Ultirum3C' for a real LTO3 drive, the data is still not compressed my TSM before it gets to the tape drive. It just tells the tape code to allow the data to be compressed by the tape hardware when it gets it. Which should be the same logic applied to a virtual LTO drive. That is, enable compression to be used on the virtual drives using the PT Manager software, but actual compression may or may not be applied by each virtual drive depending on whether 'Ultriium3' or 'Ultrium3c' type is used on the device class.

Makes sense, so then the question becomes, how should it be set? Ultrium3 or Ultrium3C? DataDomain also says turn off TSM client-side compression, but I don't see mention of how to set the format for the VTL drives. Have you found out if Ultrium3C will compress the same as a physical drive will? What's IBM's answer on that for ProtecTier VTL?
 
Makes sense, so then the question becomes, how should it be set? Ultrium3 or Ultrium3C? DataDomain also says turn off TSM client-side compression, but I don't see mention of how to set the format for the VTL drives. Have you found out if Ultrium3C will compress the same as a physical drive will? What's IBM's answer on that for ProtecTier VTL?

You don't need to set it anywhere for any virtual drive.

As with ANY deduplication product if you send it compressed data it will not be able to deduplicate it.
 
Maverick you're missing the point... I realise the TSM clients should not compress data, however, with real LTO drives setting Ultrium3 or Ultrium3C does not change the amount of data sent by the TSM server to the tape drive... ie. TSM server does not ever compress the data prior to sending it.
It just allows the tape drive to compress the data internally when it receives it.

Anyway, we're getting about 4.2 : 1 dedupe of non-structured data ... around 24TB in 6TB of storage.
 
De-dupe and compression

Hi,
As you mentioned, with physical drives like LTO3 and going back to 3480/3490, you could tell the control unit if you'd like to use compression (IDRC) or not.
Compression in ProtecTIER, (and I believe also in Data Domain,) is enabled/disabled for each library, regardless of the device type that is emulated. Data that arrives from TSM is first de-duped, and whatever is left is then compressed on the way to the disk, regardless of the device type you mentioned.
Bottom line, you cannot decide that a specific job will no be compressed. You can only enable/disable compression (and also de-dupe) per virtual library.

HTH,
Gil.
 
One thing I do not get with hyperfact and (pre)compressed data is if I send 2 identical files called summer.jpg from 2 different systems
the hyperfactor engine should see that it is the same data and only store it once giving a dedupe ratio of 2:1.
If I then send this image toa frien in e-mail anc buckup my .pst file to the VTL then it should not be stored again achving a 3:1 ratio.
Even if the data is a compressed file with small changes between the versions dudplication should still be possible. Deduplication is only
impossible if the file is encrypted each time it is changed, becuse this should be unique data every time.

So my impression is that we should send uncompressed data to get good numbers. In the end the same amount of data will be stored
but the protectier will report a lower number for the jpg or png loving pepole out there. Just becude the file will only b deduped and note compresseed as well.
 
compressed files

Hi,
If you send the same compressed file, like jpeg or MP3, several times, then HyperFactor will identify it and de-dupe it. However, if the backup stream is compressed, then HyperFactor will not find similarity within the sequence of bytes in the stream, comparing with the existing repository data.
If you take a changing database, or a file, and compress it, even with small changes between versions, the compressed output will look completely different, and will not be de-duped by any de-duplication engine.
That's why it is recommended not to compress changing data before sending it to de-dupe. Your example talks about data that is not changing, i.e. the same jpeg.

Gil.
 
Thank you for your answer Gil.

I usually do not compress databases due to the reasons you stated.
Logfiles (not databses logfiles) and large presentations where changes are just additions or removal of data should also perform pretty good even when compressed?
What size is the smallest repective the largest repeating data sequence that Protectier will detect and compress?
 
Hi,
Log files are good candidates for compression. If these are archive/redo logs, then there is nothing much to de-dupe as they are re-created all the time. So you can compress a log file before it arrives to ProtecTIER, or let ProtecTIER do the compression. If the log files are appended daily and you backup the same file with the small change of the extra lines, then it is better not to compress it and let ProtecTIER do both de-dupe and compression.
Compressed files will only be de-duped if there was 0 change in the file between versions. If you change a byte in the presnetation and compress it, the whole presentation file look different and cannot be compared to the previous compressed presentation file.
If you have the capacity and the bandwidth, I would recommend not to compress anything before or during the backup. Let ProtecTIER see the real data so it can find all the de-dupe and compression potential and utilize the minimum space on the backup repository.

Gil
 
Back
Top