Compression and Deduplication options

jkillebrew

Newcomer
Joined
Mar 23, 2015
Messages
3
Reaction score
0
Points
0
I'm new to 7.1.1.100, long time 5.5 user.

What is the proper combination of compression and deduplication on the server/client sides? I've read I should never use client side compression with server side deduplication or the process is inefficient, and yet it seems to be working... very well finding 1.6TB of duplcate bytes in 3.4TB of total data. Must I enable client side deduplication as well? I was unaware that I needed client side deduplication until today. The TSM documentation is terribly unclear to me.
 
You can use either client or server side deduplication. The end result is the same, the difference is who does the work.

With client side deduplication, the client does the deduplication, with server side, it's the server. Client-side deduplication means more work for the client, but sending less data over the network and less work on the server, opposite with server-side.

Your environment is what determines which one works best for you.
 
Thanks. My question is more over the combination of compression and deduplication. I'd prefer to deduplicate at the server side. We have compression enabled on the client, so the question is does that have a negative effect on deduplication? I can't seem to find a straight answer and the results I'm seeing are mind blowing because I expected deduplication to do very little to help and on top of that I expected client side compression to hurt the efficiency of server side deduplication, unless compression is being handled more intelligently somehow.

One document I read seems to indicate that with compression on, extents are compressed and sent to the server with information that would help with server-side deduplication. Other documents seem to indicate that I should never use compression on the client if I use deduplication.
 
Compression makes the file smaller.
Deduplication removes duplicate chunks.

You can find more detailed information in section 4.3 of this document:
https://www.ibm.com/developerworks/...f IBM Tivoli Storage Manager V6 Deduplication

Here's an extract:
In general, deduplication technologies are not very effective when applied to data that is previously
compressed. However, by compressing data after it is already deduplicated, additional savings can be
gained. When deduplication and compression are both performed by the TSM client, the operations are
sequenced in the desirable order of first applying deduplication, followed by compression.
 
The short answer is that yes by compressing your data on the client and deduplicating the data on the server you are reducing the effectiveness of the server side deduplication. That is not to say that you will not get any deduplication only that it will be reduced. Encrypting the data will further reduce the effectiveness as well and if you must encrypt the data deduplicate it client side.

Both compression and encryption will reduce the effectiveness of any deduplication regardless if it is TSM or a device such as Data Domain.
 
Thanks rdwsdw1. We're going to start over with client side compression off.
 
If you look at the link I shared, there's a graph comparing on page 40.

In order of best to good:
- client side dedup + client compression
- dedup only (client or server)
- client compression + server dedup
- client compression only

So if you want to stick with server-side dedup, then yes it's best to turn off client compression. If you can switch to client side dedup, there's a small gain to be made.
 
Back
Top