Networker

Re: [Networker] in-line dedup datadomain

2011-05-13 15:10:56
Subject: Re: [Networker] in-line dedup datadomain
From: "N.J.Tustain" <n.j.tustain AT OPEN.AC DOT UK>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Fri, 13 May 2011 19:10:13 +0000
Thanks for the responses.

As good as that process maybe, it's still performing the dedupe process on 
pieces of files from all the clients backing up at particular time, right? 

C3,C2,C1----> /DD/--

So effectively performing dedup on a random set of data, because bits of 
clients 1,2,3 are all mixed together?

Thanks
Nick Tustain

-----Original Message-----
From: Preston de Guise [mailto:enterprise.backup AT gmail DOT com] 
Sent: 12 May 2011 22:40
To: EMC NetWorker discussion; N.J.Tustain
Subject: Re: [Networker] in-line dedup datadomain

Nick,

On 13/05/2011, at 01:11 , N.J.Tustain wrote:

> I'm trying to figure something out.
> 
> Although data from a multiple client backup, is stored in separate files 
> (according to SSID) on an AFTD device, which means it is effectively 
> de-multiplexed.
> 
> One benefit
> of the simple rule that all data in a file belongs to the same save 
> set is that it enables clean staging of save sets to another medium.
> 
> If a datadomain device is being used (non VTL) mode, is the data stream it 
> de-duplicates effectively a multiplexed datastream from all the clients 
> backing up at any moment.
> 
> If so how can any decent de-dupe ratio be achieved, as the data stream it's 
> de-duping is a random mix of each clients data, which will change each time 
> the backups run.

Further to Brett's answer that you shouldn't multiplex when using a dedupe VTL 
(I've witnessed first hand the difference between the two lots of dedupe 
ratios), the Data Domain systems have quite a substantial amount of RAM. As DD 
say they're not IO bound in their dedupe performance, just CPU/RAM bound - 
without taking the time to read up here, I would have to assume that the DD 
maintains very good hash tables for its form of variable sized block dedupe. So 
as each chunk of data comes in it would be scanned against those hash tables to 
evaluate for dedupe.

Dedupe is always global within the DD - it's not just about dedupe about one 
client backup, or even one type of file - it's all data being written to it 
deduped against all other things that have been written to it. (And there's 
compression as well.)

Cheers,
Preston.

--
Preston de Guise

http://nsrd.info/blog                           NetWorker Blog
http://www.enterprisesystemsbackup.com          "Enterprise Systems Backup and 
Recovery: A corporate insurance policy"





-- 
The Open University is incorporated by Royal Charter (RC 000391), an exempt 
charity in England & Wales and a charity registered in Scotland (SC 038302).

To sign off this list, send email to listserv AT listserv.temple DOT edu and 
type "signoff networker" in the body of the email. Please write to 
networker-request AT listserv.temple DOT edu if you have any problems with this 
list. You can access the archives at 
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

<Prev in Thread] Current Thread [Next in Thread>