Re: [ADSM-L] Data Deduplication

2007-08-27 12:58:30
Subject: Re: [ADSM-L] Data Deduplication
From: Ben Bullock <bbullock AT MICRON DOT COM>
Date: Mon, 27 Aug 2007 10:56:41 -0600
Preston, I believe it depends on the de-dupe technology being used. We
have started to play with the NetApp iSIS (dedupe product) and at least
in their case they don't look at every block coming into the host. 

        Their documentation is lacking, but from what we have been able
to deduce, it seems to take a hash of the first chunk of all the files,
them compares hashes and then tries to de-dupe if the hashes match. We
saw that 400GB of 5GB files took about 3 minutes to try to dedupe and
400GB of 1MB files took over 23 hours. In this case the number of files
seems to dictate how long a de-dupe will take, to me, that doesn't sound
like it is looking at every block, because the number of blocks with
data on the filer are actually the same between the 2 attempts.
        Like I said, this is my interpretation of the results of my
testing, not anything I saw documented.


-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
Curtis Preston
Sent: Monday, August 27, 2007 10:40 AM
Subject: Re: Data Deduplication

>        3) Oracle Specific
>                Do not use RMAN's Multiplexing in RMAN will combine 4 
>Channels together and the backup data then will be unique every time
>not allowing for                        de-duping)
>                Use the File Seq=1 (Then run multiple channels)

I don't see how this would affect de-duplication if your de-dupe product
knows what it's doing.  Every block coming into the device should be
compared to every other block ever seen by the device.  So combining
multiple files together using Oracle multiplexing shouldn't affect

Did you test this, or see it in the docs somewhere?  Was this true for
multiple de-dupe vendors, or just the one you chose?

<Prev in Thread] Current Thread [Next in Thread>