Networker

Re: [Networker] Problem with mmrecov after /nsr array failure

2007-10-18 05:59:42
Subject: Re: [Networker] Problem with mmrecov after /nsr array failure
From: "Macina, Conrad" <Conrad.Macina AT PFIZER DOT COM>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Thu, 18 Oct 2007 05:54:51 -0400
We've seen a similar problem in disaster recovery testing. The
workaround is to stop NetWorker and delete the devices and jukebox using
nsradmin with NetWorker down.

# nsr_shutdown
# cd /nsr2
# nsradmin -d res/nsrdb
NetWorker administration program.
Use the "help" command for help, "visual" for full-screen mode.
nsradmin> . type:nsr device
Current query set
nsradmin> show name
nsradmin> print
The program displays a list of all defined devices
nsradmin> delete
                        name: /dev/rmt/Xcbn;
Delete? y
deleted resource id 0.170.99.85.63.30.111.108.170.116.34.57(23383)
Repeat for each device
nsradmin> . type:nsr jukebox
Current query set
nsradmin> show name
nsradmin> print
The program displays a list of all defined jukeboxes
nsradmin> delete
                        name: "rd=oldstoragenode:oldjukebox";
Delete? y
deleted resource id
1.41.66.32.0.0.0.0.0.0.0.0.65.88.118.224.170.116.34.57(5485)
This repeats for each jukebox
nsradmin> quit

After completing this, start up NetWorker and run jbconfig.

Conrad Macina
Pfizer, Inc.



-----Original Message-----
From: Stan Horwitz [mailto:stan AT TEMPLE DOT EDU] 
Sent: Wednesday, October 17, 2007 10:01 PM
Subject: Re: Problem with mmrecov after /nsr array failure

How do I delete the devices? I tried deleting the library, but not  
the devices, mainly because I didn't see that option in the NWCM GUI.

On Oct 17, 2007, at 9:45 PM, Rob Sterba wrote:

> Have you tried deleting the library and all devices and re-adding  
> them?
>
> -----Original Message-----
> From: EMC NetWorker discussion  
> [mailto:NETWORKER AT LISTSERV.TEMPLE DOT EDU] On
> Behalf Of Stan Horwitz
> Sent: Wednesday, October 17, 2007 7:36 PM
> To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
> Subject: [Networker] Problem with mmrecov after /nsr array failure
>
> This past Saturday, my NetWorker 7.4 server's /nsr storage array
> failed. The server runs Solaris 9 and the array is an old Sun A1000
> the array is one of two that was connected to the server. I placed a
> service call have have the problem fixed. On Saturday morning, I
> found myself in a computer room while a hardware fixer upper guy
> fixed the array ... so we thought. To make a long story short, fsck
> ran for 24 hours and by Sunday night, I was up and running again with
> the fixed array, but NetWorker crashed within seconds of restarting,
> so I decided to hold off until Monday morning to address the problem
> so I could get some sleep.
>
> So on Monday, rebooting the server didn't produce any SCSI errors at
> all and it came backup fine, so I did a mmrecov from Friday's
> bootstrap tape. I restarted NetWorker and all was fine. Later that
> evening, the same array died on me again. Sigh! On Tuesday, we did
> more array repairs, but nothing we tried worked. The broken A1000
> disk array is one of two we had sitting on my backup server. The
> second one is /nsr2 (which contains some CFI data for a few large
> clients), but it was only 20% full and the the /nsr array only
> contained 21GB worth of data. Since the /nsr2 array had something
> like 150GB free on it, so my boss and I decided to create a directory
> called nsr on the /nsr2 array and we disconnected the faulty /nsr
> array from the SCSI chain and powered it off. So /nsr now sits on
> the /nsr2 array and all the /nsr2 array's cfi data is still visible
> to NetWorker as /nsr2. I hope this makes sense.
>
> This all works and I get no SCSI errors at all when I rebooted the
> server twice. Since this scheme wiped out the entire contents of /
> nsr, I used jbconfig to configure a tape library resource so I could
> read the bootstrap tape. Then I used mmrecov to recover the same
> bootstrap saveset from the same tape I used on Monday. This worked,
> except for one problem. When I did the mmrecov, instead of recovering
> to /nsr/res.R it recovered the data to /nsr/res and when I restarted
> NSR, the tape library that's connected to our server appeared twice
> in the NetWorker management console window and each instance of the
> tape library had two device resources for every physical device on
> the library (14 physical devices), except for the five devices that
> we use for NDMP which only had one device resource each. This server
> also has a Linux storage node connected to a totally different
> library, and that library's resource information is fine. I spent two
> hours tonight trying to fix this issue, including doing another
> mmrecov, which also dumped its data into /nsr/res instead of /nsr/ 
> res.R.
>
> I tried deleting the second tape library resource, but this did not
> help. As a result, tape mount requests are not being satisfied for
> the main tape library, but they are for the tape library on my
> storage node. I don't know if its relevant, but the tape library is a
> Sony PetaSite with 14 S-AIT1 drives and its fibre channel connected
> to my NetWorker server. We do not do drive or tape library sharing.
> The inquire command also shows exactly the same thing it showed
> before we disconnected the broken A1000 array (except of course, for
> the missing array).
>
> If anyone has any idea how to correct this problem, please let me
> know; otherwise, I intend to open up a support case with EMC in the
> morning (since I am too exhausted to do it now).
>
> --
> Stan Horwitz
> stan AT temple DOT edu
>
> CONFIDENTIALITY STATEMENT: The information contained in this e-mail,
> including attachments, is the confidential information of, and/or is
> the property of, Temple University. The information is intended for
> use solely by the individual or entity named in the e-mail. If you
> are not an intended recipient or you received this in error, then any
> review, printing, copying, or distribution of any such information is
> prohibited. Please notify the sender immediately by reply e-mail and
> then delete this e-mail from your system.
>
> To sign off this list, send email to listserv AT listserv.temple DOT edu and
> type "signoff networker" in the body of the email. Please write to
> networker-request AT listserv.temple DOT edu if you have any problems with  
> this
> list. You can access the archives at
> http://listserv.temple.edu/archives/networker.html or
> via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER
>
> To sign off this list, send email to listserv AT listserv.temple DOT edu  
> and type "signoff networker" in the body of the email. Please write  
> to networker-request AT listserv.temple DOT edu if you have any problems  
> with this list. You can access the archives at http:// 
> listserv.temple.edu/archives/networker.html or
> via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

--
Stan Horwitz
stan AT temple DOT edu

CONFIDENTIALITY STATEMENT: The information contained in this e-mail,  
including attachments, is the confidential information of, and/or is  
the property of, Temple University. The information is intended for  
use solely by the individual or entity named in the e-mail. If you  
are not an intended recipient or you received this in error, then any  
review, printing, copying, or distribution of any such information is  
prohibited. Please notify the sender immediately by reply e-mail and  
then delete this e-mail from your system.

To sign off this list, send email to listserv AT listserv.temple DOT edu and
type "signoff networker" in the body of the email. Please write to
networker-request AT listserv.temple DOT edu if you have any problems with this
list. You can access the archives at
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

To sign off this list, send email to listserv AT listserv.temple DOT edu and 
type "signoff networker" in the body of the email. Please write to 
networker-request AT listserv.temple DOT edu if you have any problems with this 
list. You can access the archives at 
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER