Networker

Re: [Networker] NetWorker + DD Boost vs. NetWorker & Avamar

2012-11-07 17:48:52
Subject: Re: [Networker] NetWorker + DD Boost vs. NetWorker & Avamar
From: Tim Mooney <Tim.Mooney AT NDSU DOT EDU>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Wed, 7 Nov 2012 16:47:58 -0600
In regard to: RE: [Networker] NetWorker + DD Boost vs. NetWorker & Avamar,...:

Also another difference between Avamar and DD, is that if you want to
clone the data out to tape after it has been de-duped then you can
pretty much rule Avamar out of the picture, with the Data Domains in the
picture, its workable, but only as pretty as tape will allow it to be
...

Thanks again for you thoughts Mat!  They're definitely appreciated.

I did know that tapeout with Avamar has been a big problem for years.
It's my understanding that up until recently, tapeout with Avamar meant
doing a restore to a temporary spot and then a backup - yuck.  However,
the reading I've been doing indicates that the product has improved
somewhat.  I'm not certain I'll be to find the link (it might have
been a StorageZilla blog post, not sure), but I recall reading that
current Avamar + Networker 8.x now allows direct tapeout via NetWorker
cloning.

Of course, I could have totally misunderstood.

Because we have geographically separate data centers, we're in a better
position than most sites to avoid offsiting and tape completely.  I'm
not certain we'll be able to afford to cut tape out, but there's no
technical or regulatory requirement that would require us to have tape.

The architectures between the a DD and an Avamar grid are substantially
different, with the DD's being a single processing head with multiple
SAS disk trays attached, and the Avamar grid being a collection of up to
16 nodes (pizza box servers with their own internal storage), the DD's
are able to scale to a larger capacity within a single unit than an
Avamar grid.

However I do wonder if the architecture of an Avamar grid (ie multiple
processing nodes) would allow higher throughput  under higher loads,
I'll have a hunt around and see if I can find if anyone has done a bake
off between the two ...

If you do find something like that, please do share.

Ultimately, it may be that we decide to have traditional backup to tape
for some clients, NetWorker + DD Boost for others, and Avamar for yet
a third type of clients, but I think we'll have to have a pointed
conversation with EMC to understand what advantages (other than the
bandwidth one you pointed out) Avamar might have over NetWorker 8.x +
DD Boost.

Thanks again,

Tim

-----Original Message-----
From: EMC NetWorker discussion [mailto:NETWORKER AT LISTSERV.TEMPLE DOT EDU] On 
Behalf Of Tim Mooney
Sent: Thursday, 8 November 2012 7:18 AM
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Subject: Re: [Networker] NetWorker + DD Boost vs. NetWorker & Avamar

In regard to: Re: [Networker] NetWorker + DD Boost vs. NetWorker & Avamar,...:

Just to answer a small part of one of your questions in terms of the
difference between NetWorker 8.x + DD Boost and Avamar. Is that DD
Boost is still very chatty when compared to Avamar, which means that
Avamar works a lot better (and I believe is certified) over
slowish/high latency WAN links whereas DD Boost isn't certified, and
pretty much doesn't work in this circumstance.

I hadn't considered that potential difference.

As far as a definition of slowish and
high latency goes you'd need to talk to your EMC rep to find out what
sort of links they will qualify DD Boost over, as an example we have a
50Mb/s link to a site that has a 28ms response time to pings (the site
is about 1800KM's away) - so a reasonably fat pipe, but with a high
latency and a backup to networker with DD Boost (this was networker
7.6.x to DD running DDOS 5.0.x) and the backup ran at a crawl, whereas
Avamar backups work fine, its also possible that performance with
Networker 8 and DDOS 5.1/5.2 have improved its performance over this
type of link

We mostly have 10 Gigabit between our main campus and satellite campus 
locations, but since we're a land grant institution we do have things like 
regional agricultural extension offices and other outlying state agencies that 
we may want to someday back up, so your point about how performant each product 
is is very helpful.

As for recovery times, it's going to depend on your situation, if you
are performing single or a small number (this number is going to vary
depending on the model of dedupe appliance that you are considering)
of recoveries concurrently then I'd say that performance would be
similar to traditional disk based backup solutions, and faster than
tape based solutions. But if you are talking about a larger number of
concurrent recoveries then there are going to be a number of variables
in your environment that are going to contribute to the speed of
recovery, backup server / storage node spec's and load, backend
network infrastructure, target client spec's  and load.

I was mainly thinking about a complete restore of a big data volume after some 
type of catastrophic issue, where we have to go back to a point in time before 
some type of corruption event happened.  I'm much less worried about one-off 
and small batch file recoveries, and wasn't really considering multiple 
simultaneous complete recoveries (though that's also a consideration, 
obviously, it's not specifically what I would aim our RTO at).

Thanks for the response!  It's been very helpful.

Tim

-----Original Message-----
From: EMC NetWorker discussion [mailto:NETWORKER AT LISTSERV.TEMPLE DOT EDU]
On Behalf Of Tim Mooney
Sent: Tuesday, 6 November 2012 8:11 AM
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Subject: [Networker] NetWorker + DD Boost vs. NetWorker & Avamar

All-

We're currently in the throes of re-evaluating the antiquated way we do backups 
(everything is still straight to tape in our environment).

I've been doing quite a bit of reading about Data Domain + Boost and Avamar, 
but even EMC's most recent documentation doesn't seem to be very clear.  There 
was some good info (as usual) on Preston's nsrd.info blog, but I'm still 
confused.

As I understand things:

- prior to NetWorker 7.6.1, the DD integration was minimal, and all
  dedupe happened on the appliance (target), so you still always
  transferred all data over the network.

- at NetWorker 7.6.1 and later, DD + Boost integration allows dedupe to
  happen at the storage node, so traffic between the node and the DD
  box is reduced, but it's still not source dedupe.

- I've since read reports that at NetWorker 8.x, in addition to this
  "client direct" bit that I'm not quite clear on, the NetWorker
  *client* software can actually do the dedupe, so you have true
  source-side dedupe with full NetWorker integration.

Is all of that correct?

If it is, and a NetWorker 8.x + DD Boost environment can do true source dedupe, 
where does Avamar fit?  Is that still better for VMWare source dedupe?  Is the 
NetWorker client dedupe not variable block?  Does only Avamar do global source 
dedupe, and NetWorker+DD Boost is perhaps only per-client dedupe?

If anyone can point me to some good publicly available or Powerlink 
documentation that explains this, it would be much appreciated.

Also, for those of you that are using source dedupe now, I've read reports that although 
the backup window will shrink dramatically after the first full, the restore times may 
actually get worse, as data "rehydration"
takes longer than recovering from a traditional full.  Is that just outdated 
information?

Either way, source dedupe seems to be a fantastic way to shrink the backup 
window, but what strategies are people currently using to also shrink the 
recovery window?  We geographically mirror (at the block level, via Linux 
software raid) our largest SAN volumes on many of our servers, but that doesn't 
protect from file removal or things like filesystem corruption or application 
induced data corruption.  As part of the complete overhaul of how we're doing 
backups, we would like to be able to confidently establish recovery time 
objectives for our big volumes, and I would love to hear how other sites are 
meeting their RTOs on 2+ TB volumes with 5+ million files.

Thanks,

Tim




--
Tim Mooney                                             Tim.Mooney AT ndsu DOT 
edu
Enterprise Computing & Infrastructure                  701-231-1076 (Voice)
Room 242-J6, IACC Building                             701-231-8541 (Fax)
North Dakota State University, Fargo, ND 58105-5164