Networker

Re: [Networker] Multiple drives for recovery?

2010-03-11 15:45:13
Subject: Re: [Networker] Multiple drives for recovery?
From: Tim Mooney <Tim.Mooney AT NDSU DOT EDU>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Thu, 11 Mar 2010 14:42:36 -0600
In regard to: Re: [Networker] Multiple drives for recovery?, George...:

Yes, we've seen NetWorker parallelize multi-volume recovers.  Most of the
time it works pretty well.  IIRC, this is something that was added in the
7.x series (earlier versions would always serialize volume access).  It
used to be configurable by creating a file in /nsr/debug (do a substring
search of the mailing list archives for striped_recover for more info).

I find it hard to believe that NW can utilize multiple drives. How does it merge and/or munge everything properly? What if you're instead
recovering from multiple fulls?

Huh?  Recovering from multiple fulls?  When (and why?!) would you recover
from multiple fulls, simultaneously, and restore all of the data into the
original location?  I have to be misunderstanding something, cause I'm not
following.

Think about the browse process.  Let's say you need to recover three files
from a directory.  fileA never changes, so it only gets backed up when you
do a full backup.  fileB changes once a week, so it only gets backed up on
your Saturday backups.  fileC changes daily, so it gets backed up every
day.

Now you browse the index, find that directory, do a

        add fileA fileB fileC

and then run

        volumes

and it shows you that it will require 3 tape volumes.

NetWorker knows that only one copy of fileA is coming back (from the
full), only one copy of fileB is coming back (from the tape from Saturday)
and only one copy of fileC is coming back (from the most recently daily).

It can parallelize the recovery because there are no conflicts.

Now I anticipate it would make the coding much simpler if NetWorker
always recovered from backups of a lower level first, so your full
would be the first one that got recovered.  If fileB and fileC were
both backed up at level 'incr', there's no reason why they couldn't be
recovered in either order or completely in parallel.

How can it temporarily store all that
data as disk space could become jeopardized at some point. How does it organize and/or re-conglomerate all that later?

I'm totally not following.

We have, however, seen a few instances where recover apparently deadlocks
in the striped recovery code.  This happened to us to a couple of times
under 7.2.x or 7.4.x, but we upgraded to 7.5.2 last week and the first big
recover we had to do triggered a deadlock in recovery.  We've had a case
open with EMC about this issue since last Friday.

What do you mean by 'deadlocks'?

I mean pauses indefinitely and can't seem to be "prodded" into continuing.
The recovery process stops after one or more tapes have been read (so
it's part way through the process of recovering the files that were
requested) and never proceeds with subsequent tapes.

There are reasons why this can happen, like media database corruption (Networker knows the ssids it needs but can't figure out which tapes
they're on) or issues with the jukebox resource (it's partially corrupt
and NetWorker knows the tape is nearline but doesn't know where it is),
but both of those issues were ruled out in the case we had open.

I was thinking to try upgrading the client software on client A, but I doubt that has any effect over what the server decides to do on its end in terms of loading those tapes.

That's exactly what I thought, but it turns out to be incorrect, much to
my great surprise.  I had assumed that other than its role in selecting
the files to be recovered, the client couldn't possibly have any influence
on how the server goes about the process of actually finding the data and
feeding it back to the client, but it looks like there's more going on
there than I understand.

We got a resolution from EMC this morning that was able to resolve the
deadlock issue we were seeing.  Upgrading the client software from 7.4.2
to 7.5.2 (to match the server) fixed the issue.  I still don't understand
how the client could be influencing this, but it appears it was.

Tim
--
Tim Mooney                                             Tim.Mooney AT ndsu DOT 
edu
Enterprise Computing & Infrastructure                  701-231-1076 (Voice)
Room 242-J6, IACC Building                             701-231-8541 (Fax)
North Dakota State University, Fargo, ND 58105-5164

To sign off this list, send email to listserv AT listserv.temple DOT edu and type 
"signoff networker" in the body of the email. Please write to networker-request 
AT listserv.temple DOT edu if you have any problems with this list. You can access the 
archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

<Prev in Thread] Current Thread [Next in Thread>