Networker

Re: [Networker] NetWorker + DD Boost vs. NetWorker & Avamar

2012-11-07 17:06:02
Subject: Re: [Networker] NetWorker + DD Boost vs. NetWorker & Avamar
From: Mathew Harvest <Mathew.HARVEST AT COMMUNITIES.QLD.GOV DOT AU>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Thu, 8 Nov 2012 08:05:28 +1000
Hi Tim,

Also another difference between Avamar and DD, is that if you want to clone the 
data out to tape after it has been de-duped then you can pretty much rule 
Avamar out of the picture, with the Data Domains in the picture, its workable, 
but only as pretty as tape will allow it to be ...

The architectures between the a DD and an Avamar grid are substantially 
different, with the DD's being a single processing head with multiple SAS disk 
trays attached, and the Avamar grid being a collection of up to 16 nodes (pizza 
box servers with their own internal storage), the DD's are able to scale to a 
larger capacity within a single unit than an Avamar grid. 

However I do wonder if the architecture of an Avamar grid (ie multiple 
processing nodes) would allow higher throughput  under higher loads, I'll have 
a hunt around and see if I can find if anyone has done a bake off between the 
two ... 


Mat

-----Original Message-----
From: EMC NetWorker discussion [mailto:NETWORKER AT LISTSERV.TEMPLE DOT EDU] On 
Behalf Of Tim Mooney
Sent: Thursday, 8 November 2012 7:18 AM
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Subject: Re: [Networker] NetWorker + DD Boost vs. NetWorker & Avamar

In regard to: Re: [Networker] NetWorker + DD Boost vs. NetWorker & Avamar,...:

> Just to answer a small part of one of your questions in terms of the 
> difference between NetWorker 8.x + DD Boost and Avamar. Is that DD 
> Boost is still very chatty when compared to Avamar, which means that 
> Avamar works a lot better (and I believe is certified) over 
> slowish/high latency WAN links whereas DD Boost isn't certified, and 
> pretty much doesn't work in this circumstance.

I hadn't considered that potential difference.

> As far as a definition of slowish and
> high latency goes you'd need to talk to your EMC rep to find out what 
> sort of links they will qualify DD Boost over, as an example we have a 
> 50Mb/s link to a site that has a 28ms response time to pings (the site 
> is about 1800KM's away) - so a reasonably fat pipe, but with a high 
> latency and a backup to networker with DD Boost (this was networker 
> 7.6.x to DD running DDOS 5.0.x) and the backup ran at a crawl, whereas 
> Avamar backups work fine, its also possible that performance with 
> Networker 8 and DDOS 5.1/5.2 have improved its performance over this 
> type of link

We mostly have 10 Gigabit between our main campus and satellite campus 
locations, but since we're a land grant institution we do have things like 
regional agricultural extension offices and other outlying state agencies that 
we may want to someday back up, so your point about how performant each product 
is is very helpful.

> As for recovery times, it's going to depend on your situation, if you 
> are performing single or a small number (this number is going to vary 
> depending on the model of dedupe appliance that you are considering) 
> of recoveries concurrently then I'd say that performance would be 
> similar to traditional disk based backup solutions, and faster than 
> tape based solutions. But if you are talking about a larger number of 
> concurrent recoveries then there are going to be a number of variables 
> in your environment that are going to contribute to the speed of 
> recovery, backup server / storage node spec's and load, backend 
> network infrastructure, target client spec's  and load.

I was mainly thinking about a complete restore of a big data volume after some 
type of catastrophic issue, where we have to go back to a point in time before 
some type of corruption event happened.  I'm much less worried about one-off 
and small batch file recoveries, and wasn't really considering multiple 
simultaneous complete recoveries (though that's also a consideration, 
obviously, it's not specifically what I would aim our RTO at).

Thanks for the response!  It's been very helpful.

Tim

> -----Original Message-----
> From: EMC NetWorker discussion [mailto:NETWORKER AT LISTSERV.TEMPLE DOT EDU] 
> On Behalf Of Tim Mooney
> Sent: Tuesday, 6 November 2012 8:11 AM
> To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
> Subject: [Networker] NetWorker + DD Boost vs. NetWorker & Avamar
>
> All-
>
> We're currently in the throes of re-evaluating the antiquated way we do 
> backups (everything is still straight to tape in our environment).
>
> I've been doing quite a bit of reading about Data Domain + Boost and Avamar, 
> but even EMC's most recent documentation doesn't seem to be very clear.  
> There was some good info (as usual) on Preston's nsrd.info blog, but I'm 
> still confused.
>
> As I understand things:
>
> - prior to NetWorker 7.6.1, the DD integration was minimal, and all
>   dedupe happened on the appliance (target), so you still always
>   transferred all data over the network.
>
> - at NetWorker 7.6.1 and later, DD + Boost integration allows dedupe to
>   happen at the storage node, so traffic between the node and the DD
>   box is reduced, but it's still not source dedupe.
>
> - I've since read reports that at NetWorker 8.x, in addition to this
>   "client direct" bit that I'm not quite clear on, the NetWorker
>   *client* software can actually do the dedupe, so you have true
>   source-side dedupe with full NetWorker integration.
>
> Is all of that correct?
>
> If it is, and a NetWorker 8.x + DD Boost environment can do true source 
> dedupe, where does Avamar fit?  Is that still better for VMWare source 
> dedupe?  Is the NetWorker client dedupe not variable block?  Does only Avamar 
> do global source dedupe, and NetWorker+DD Boost is perhaps only per-client 
> dedupe?
>
> If anyone can point me to some good publicly available or Powerlink 
> documentation that explains this, it would be much appreciated.
>
> Also, for those of you that are using source dedupe now, I've read reports 
> that although the backup window will shrink dramatically after the first 
> full, the restore times may actually get worse, as data "rehydration"
> takes longer than recovering from a traditional full.  Is that just outdated 
> information?
>
> Either way, source dedupe seems to be a fantastic way to shrink the backup 
> window, but what strategies are people currently using to also shrink the 
> recovery window?  We geographically mirror (at the block level, via Linux 
> software raid) our largest SAN volumes on many of our servers, but that 
> doesn't protect from file removal or things like filesystem corruption or 
> application induced data corruption.  As part of the complete overhaul of how 
> we're doing backups, we would like to be able to confidently establish 
> recovery time objectives for our big volumes, and I would love to hear how 
> other sites are meeting their RTOs on 2+ TB volumes with 5+ million files.
>
> Thanks,
>
> Tim
>

-- 
Tim Mooney                                             Tim.Mooney AT ndsu DOT 
edu
Enterprise Computing & Infrastructure                  701-231-1076 (Voice)
Room 242-J6, IACC Building                             701-231-8541 (Fax)
North Dakota State University, Fargo, ND 58105-5164
********************************* DISCLAIMER *********************************
The information contained in the above e-mail message or messages (which 
includes any attachments) is confidential and may be legally privileged. It is 
intended only for the use of the person or entity to which it is addressed. If 
you are not the addressee any form of disclosure, copying, modification, 
distribution or any action taken or omitted in reliance on the information is 
unauthorised. Opinions contained in the message(s) do not necessarily reflect 
the opinions of the Queensland Government and its authorities. If you received 
this communication in error, please notify the sender immediately and delete it 
from your computer system network.