Re: [ADSM-L] De-dup ratio's

Subject:

From:

Josh Davis <xaminmo AT OMNITECH DOT NET>

To:

Date:

Fri, 12 Nov 2010 11:24:26 -0600

I vote +1 for low dedupe ratios being due to precompressed data:
* Even MS Office files are actually ZIP files now.
* Windows keeps gigs of installers, which are mostly precompressed cabinets
* Many application data dumps are precompressed
* All practical media files are precompressed
* Many file servers contain a large amount of the above datatypes, plus tgz
or zip or rar or 7z or whatever as snapshots
* Many TSM environments enable client side compression, and a few enable
client-side encryption.
* TSM already does basic deduplication by using incremental strategy on a
file level.

If all of your OS images are clones of a golden image, then it helps a
little, even with noncompressible data.
Using gzip's option --rsyncable or any other content or dedupe aware options
can sometimes help a little
If using TSM client side compression (for bandwidth reasons), then TSM
client-side dedupe can see through that.
As the others have already stated, the best option is to separate out your
non-compressible data.

Dedupe is just compression with a very large dictionary.  Recompressing
doesn't work very well most of the time.  Even deduplicating multiple
versions of a document is tough with compressed XML formats.  You change the
file, recompress it, and the dictionary changes.  Because of that, the end
payload is vastly different.    For example, make a word docx that's a
couple of megs.  Modify it in several places, re-save it.  Then, try to zip
the two together, and you won't get a 45% savings.

<Prev in Thread]

Current Thread

[Next in Thread>

Previous by Date:

Re: [ADSM-L] De-dup ratio's, David Longo

Next by Date:

[ADSM-L] Cost of moving to collocation by filespace, Evans, Bill

Previous by Thread:

Re: [ADSM-L] De-dup ratio's, David Longo

Next by Thread:

Re: [ADSM-L] De-dup ratio's, Amos ADSM

Indexes:

[Date] [Thread] [Top] [All Lists]