[Networker] NetWorker + DD Boost vs. NetWorker & Avamar

All-

We're currently in the throes of re-evaluating the antiquated way we do
backups (everything is still straight to tape in our environment).

I've been doing quite a bit of reading about Data Domain + Boost and
Avamar, but even EMC's most recent documentation doesn't seem to be
very clear.  There was some good info (as usual) on Preston's nsrd.info
blog, but I'm still confused.

As I understand things:

- prior to NetWorker 7.6.1, the DD integration was minimal, and all
  dedupe happened on the appliance (target), so you still always
  transferred all data over the network.

- at NetWorker 7.6.1 and later, DD + Boost integration allows dedupe to
  happen at the storage node, so traffic between the node and the DD
  box is reduced, but it's still not source dedupe.

- I've since read reports that at NetWorker 8.x, in addition to this
  "client direct" bit that I'm not quite clear on, the NetWorker
  *client* software can actually do the dedupe, so you have true
  source-side dedupe with full NetWorker integration.

Is all of that correct?

If it is, and a NetWorker 8.x + DD Boost environment can do true source
dedupe, where does Avamar fit?  Is that still better for VMWare source
dedupe?  Is the NetWorker client dedupe not variable block?  Does only
Avamar do global source dedupe, and NetWorker+DD Boost is perhaps only
per-client dedupe?

If anyone can point me to some good publicly available or Powerlink
documentation that explains this, it would be much appreciated.

Also, for those of you that are using source dedupe now, I've read reports
that although the backup window will shrink dramatically after the first
full, the restore times may actually get worse, as data "rehydration"
takes longer than recovering from a traditional full.  Is that just
outdated information?

Either way, source dedupe seems to be a fantastic way to shrink the backup
window, but what strategies are people currently using to also shrink the
recovery window?  We geographically mirror (at the block level, via Linux
software raid) our largest SAN volumes on many of our servers, but that
doesn't protect from file removal or things like filesystem corruption or
application induced data corruption.  As part of the complete overhaul
of how we're doing backups, we would like to be able to confidently
establish recovery time objectives for our big volumes, and I would love
to hear how other sites are meeting their RTOs on 2+ TB volumes with 5+
million files.

Thanks,

Tim
--
Tim Mooney                                             Tim.Mooney AT ndsu DOT 
edu
Enterprise Computing & Infrastructure                  701-231-1076 (Voice)
Room 242-J6, IACC Building                             701-231-8541 (Fax)
North Dakota State University, Fargo, ND 58105-5164