ADSM-L

Re: [ADSM-L] de-duplicating compressed data

2009-11-08 10:12:45
Subject: Re: [ADSM-L] de-duplicating compressed data
From: Kelly Lipp <lipp AT STORSERVER DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Sun, 8 Nov 2009 08:11:06 -0700
That assumes that the compression occurs file by file.  Is that true or is on 
the transaction.  I suppose it is on the files themselves and all clients would 
compress the file into the same set of bits.  If it doesn't do that though, 
then your high dedup rates won't be realized.

Kelly Lipp
Chief Technical Officer
www.storserver.com
719-266-8777 x7105
STORServer solves your data backup challenges. 
Once and for all.


-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of 
Grigori Solonovitch
Sent: Saturday, November 07, 2009 9:16 PM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: [ADSM-L] de-duplicating compressed data

>>>What is the effect of compression on de-duplication? Does it help to reach a 
>>>more de-duplication level?

This is my opinion (please correct, if something is wrong):

1) note we are talking about client compression (compression=yes for node or in 
dsm.opt). Hardware compression on drive level is tottally independent from 
dedup process;

2) client compression can be used for any primary storage pool (device type 
DISK, FILE or any tapes). In this case, compressed data is comming to copy 
pools as well and you need less number of tapes in copy pools;

3) client compression takes time during backups (backups are much longer), but 
amount of data sent to TSM server via network is much less (average compression 
rate is 2-4 times);

4) deduplication is working only with primary sequential disk storage pool 
(device class FILE) and can give compression rate 10-20 and more. Deduplication 
process is working with data from all nodes (not only from one) and compares 
ALL to ALL. So just imagine which comression rate you can reach for some cases, 
when there are a lot of similar Windows servers (like server in each bank 
branch) with the same level of Windows and the same applications. For 50 
branches you can have compression rate 40;

5) I see only one reason why deduplication is only working with FILE and is not 
working with DISK - after software deduplication you need to run reclamation to 
release space. Reclamation is not applicaple for DISK with random access. By 
the way, this  question is still open and only IBM can anwer, what is the real 
reason;

6)  there is special protection for data in TSM server. Deduplication is not 
working with data, if there is less than 2 copies on tapes. So sequence of 
actions is: backup data to DISK, make at least 2 copies of data to tapes 
(without deduplication!!), start deuplication and start reclamation. 
Deduplication will never reduce data on copy pools;

7) deduplication and compression are working together, but overal compression 
rate will be more than with only compression, but much less than with only 
deduplication. For example, you will have compression rate N for compression 
only (backups and all copies), M for deduplication only (only backups, copies 
have full size) and K for compression/deduplication (K for backups and N for 
copies).

In general, N is much less than M, K is more than N and less than K. Real 
values for N, M and K depend on type of data;

Regards,

Grigori

Please consider the environment before printing this Email.

________________________________
"This email message and any attachments transmitted with it may contain 
confidential and proprietary information, intended only for the named 
recipient(s). If you have received this message in error, or if you are not the 
named recipient(s), please delete this email after notifying the sender 
immediately. BKME cannot guarantee the integrity of this communication and 
accepts no liability for any damage caused by this email or its attachments due 
to viruses, any other defects, interception or unauthorized modification. The 
information, views, opinions and comments of this message are those of the 
individual and not necessarily endorsed by BKME."

<Prev in Thread] Current Thread [Next in Thread>