That assumes that the compression occurs file by file. Is that true or is on
the transaction. I suppose it is on the files themselves and all clients would
compress the file into the same set of bits. If it doesn't do that though,
then your high dedup rates won't be realized.
Kelly Lipp
Chief Technical Officer
www.storserver.com
719-266-8777 x7105
STORServer solves your data backup challenges.
Once and for all.
-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
Grigori Solonovitch
Sent: Saturday, November 07, 2009 9:16 PM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: [ADSM-L] de-duplicating compressed data
>>>What is the effect of compression on de-duplication? Does it help to reach a
>>>more de-duplication level?
This is my opinion (please correct, if something is wrong):
1) note we are talking about client compression (compression=yes for node or in
dsm.opt). Hardware compression on drive level is tottally independent from
dedup process;
2) client compression can be used for any primary storage pool (device type
DISK, FILE or any tapes). In this case, compressed data is comming to copy
pools as well and you need less number of tapes in copy pools;
3) client compression takes time during backups (backups are much longer), but
amount of data sent to TSM server via network is much less (average compression
rate is 2-4 times);
4) deduplication is working only with primary sequential disk storage pool
(device class FILE) and can give compression rate 10-20 and more. Deduplication
process is working with data from all nodes (not only from one) and compares
ALL to ALL. So just imagine which comression rate you can reach for some cases,
when there are a lot of similar Windows servers (like server in each bank
branch) with the same level of Windows and the same applications. For 50
branches you can have compression rate 40;
5) I see only one reason why deduplication is only working with FILE and is not
working with DISK - after software deduplication you need to run reclamation to
release space. Reclamation is not applicaple for DISK with random access. By
the way, this question is still open and only IBM can anwer, what is the real
reason;
6) there is special protection for data in TSM server. Deduplication is not
working with data, if there is less than 2 copies on tapes. So sequence of
actions is: backup data to DISK, make at least 2 copies of data to tapes
(without deduplication!!), start deuplication and start reclamation.
Deduplication will never reduce data on copy pools;
7) deduplication and compression are working together, but overal compression
rate will be more than with only compression, but much less than with only
deduplication. For example, you will have compression rate N for compression
only (backups and all copies), M for deduplication only (only backups, copies
have full size) and K for compression/deduplication (K for backups and N for
copies).
In general, N is much less than M, K is more than N and less than K. Real
values for N, M and K depend on type of data;
Regards,
Grigori
Please consider the environment before printing this Email.
________________________________
"This email message and any attachments transmitted with it may contain
confidential and proprietary information, intended only for the named
recipient(s). If you have received this message in error, or if you are not the
named recipient(s), please delete this email after notifying the sender
immediately. BKME cannot guarantee the integrity of this communication and
accepts no liability for any damage caused by this email or its attachments due
to viruses, any other defects, interception or unauthorized modification. The
information, views, opinions and comments of this message are those of the
individual and not necessarily endorsed by BKME."
|