ADSM-L

Re: [ADSM-L] De-dup ratio's

2010-11-12 12:26:36
Subject: Re: [ADSM-L] De-dup ratio's
From: Josh Davis <xaminmo AT OMNITECH DOT NET>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Fri, 12 Nov 2010 11:24:26 -0600
I vote +1 for low dedupe ratios being due to precompressed data:
* Even MS Office files are actually ZIP files now.
* Windows keeps gigs of installers, which are mostly precompressed cabinets
* Many application data dumps are precompressed
* All practical media files are precompressed
* Many file servers contain a large amount of the above datatypes, plus tgz
or zip or rar or 7z or whatever as snapshots
* Many TSM environments enable client side compression, and a few enable
client-side encryption.
* TSM already does basic deduplication by using incremental strategy on a
file level.

If all of your OS images are clones of a golden image, then it helps a
little, even with noncompressible data.
Using gzip's option --rsyncable or any other content or dedupe aware options
can sometimes help a little
If using TSM client side compression (for bandwidth reasons), then TSM
client-side dedupe can see through that.
As the others have already stated, the best option is to separate out your
non-compressible data.

Dedupe is just compression with a very large dictionary.  Recompressing
doesn't work very well most of the time.  Even deduplicating multiple
versions of a document is tough with compressed XML formats.  You change the
file, recompress it, and the dictionary changes.  Because of that, the end
payload is vastly different.    For example, make a word docx that's a
couple of megs.  Modify it in several places, re-save it.  Then, try to zip
the two together, and you won't get a 45% savings.

<Prev in Thread] Current Thread [Next in Thread>