Networker

Re: [Networker] Problem with mmrecov after /nsr array failure

2007-10-17 22:05:16
Subject: Re: [Networker] Problem with mmrecov after /nsr array failure
From: Stan Horwitz <stan AT TEMPLE DOT EDU>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Wed, 17 Oct 2007 22:00:56 -0400
How do I delete the devices? I tried deleting the library, but not the devices, mainly because I didn't see that option in the NWCM GUI.

On Oct 17, 2007, at 9:45 PM, Rob Sterba wrote:

Have you tried deleting the library and all devices and re-adding them?

-----Original Message-----
From: EMC NetWorker discussion [mailto:NETWORKER AT LISTSERV.TEMPLE DOT EDU] On
Behalf Of Stan Horwitz
Sent: Wednesday, October 17, 2007 7:36 PM
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Subject: [Networker] Problem with mmrecov after /nsr array failure

This past Saturday, my NetWorker 7.4 server's /nsr storage array
failed. The server runs Solaris 9 and the array is an old Sun A1000
the array is one of two that was connected to the server. I placed a
service call have have the problem fixed. On Saturday morning, I
found myself in a computer room while a hardware fixer upper guy
fixed the array ... so we thought. To make a long story short, fsck
ran for 24 hours and by Sunday night, I was up and running again with
the fixed array, but NetWorker crashed within seconds of restarting,
so I decided to hold off until Monday morning to address the problem
so I could get some sleep.

So on Monday, rebooting the server didn't produce any SCSI errors at
all and it came backup fine, so I did a mmrecov from Friday's
bootstrap tape. I restarted NetWorker and all was fine. Later that
evening, the same array died on me again. Sigh! On Tuesday, we did
more array repairs, but nothing we tried worked. The broken A1000
disk array is one of two we had sitting on my backup server. The
second one is /nsr2 (which contains some CFI data for a few large
clients), but it was only 20% full and the the /nsr array only
contained 21GB worth of data. Since the /nsr2 array had something
like 150GB free on it, so my boss and I decided to create a directory
called nsr on the /nsr2 array and we disconnected the faulty /nsr
array from the SCSI chain and powered it off. So /nsr now sits on
the /nsr2 array and all the /nsr2 array's cfi data is still visible
to NetWorker as /nsr2. I hope this makes sense.

This all works and I get no SCSI errors at all when I rebooted the
server twice. Since this scheme wiped out the entire contents of /
nsr, I used jbconfig to configure a tape library resource so I could
read the bootstrap tape. Then I used mmrecov to recover the same
bootstrap saveset from the same tape I used on Monday. This worked,
except for one problem. When I did the mmrecov, instead of recovering
to /nsr/res.R it recovered the data to /nsr/res and when I restarted
NSR, the tape library that's connected to our server appeared twice
in the NetWorker management console window and each instance of the
tape library had two device resources for every physical device on
the library (14 physical devices), except for the five devices that
we use for NDMP which only had one device resource each. This server
also has a Linux storage node connected to a totally different
library, and that library's resource information is fine. I spent two
hours tonight trying to fix this issue, including doing another
mmrecov, which also dumped its data into /nsr/res instead of /nsr/ res.R.

I tried deleting the second tape library resource, but this did not
help. As a result, tape mount requests are not being satisfied for
the main tape library, but they are for the tape library on my
storage node. I don't know if its relevant, but the tape library is a
Sony PetaSite with 14 S-AIT1 drives and its fibre channel connected
to my NetWorker server. We do not do drive or tape library sharing.
The inquire command also shows exactly the same thing it showed
before we disconnected the broken A1000 array (except of course, for
the missing array).

If anyone has any idea how to correct this problem, please let me
know; otherwise, I intend to open up a support case with EMC in the
morning (since I am too exhausted to do it now).

--
Stan Horwitz
stan AT temple DOT edu

CONFIDENTIALITY STATEMENT: The information contained in this e-mail,
including attachments, is the confidential information of, and/or is
the property of, Temple University. The information is intended for
use solely by the individual or entity named in the e-mail. If you
are not an intended recipient or you received this in error, then any
review, printing, copying, or distribution of any such information is
prohibited. Please notify the sender immediately by reply e-mail and
then delete this e-mail from your system.

To sign off this list, send email to listserv AT listserv.temple DOT edu and
type "signoff networker" in the body of the email. Please write to
networker-request AT listserv.temple DOT edu if you have any problems with this
list. You can access the archives at
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

To sign off this list, send email to listserv AT listserv.temple DOT edu and type "signoff networker" in the body of the email. Please write to networker-request AT listserv.temple DOT edu if you have any problems with this list. You can access the archives at http:// listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

--
Stan Horwitz
stan AT temple DOT edu

CONFIDENTIALITY STATEMENT: The information contained in this e-mail, including attachments, is the confidential information of, and/or is the property of, Temple University. The information is intended for use solely by the individual or entity named in the e-mail. If you are not an intended recipient or you received this in error, then any review, printing, copying, or distribution of any such information is prohibited. Please notify the sender immediately by reply e-mail and then delete this e-mail from your system.

To sign off this list, send email to listserv AT listserv.temple DOT edu and type 
"signoff networker" in the body of the email. Please write to networker-request 
AT listserv.temple DOT edu if you have any problems with this list. You can access the 
archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER