Networker

Re: [Networker] tape drives getting "stuck" on storage node

2008-06-10 12:32:59
Subject: Re: [Networker] tape drives getting "stuck" on storage node
From: Alex Alexiou <AAlexiou AT TARGETSITE DOT COM>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Tue, 10 Jun 2008 12:19:32 -0400
Thanks for all the suggestions. I'll try some of the methods out and see
if it helps. I'm also bugging HP like crazy to see if they can look at
our EML more; I suspect there is something wrong with it, since our ADIC
has been working fine. I also may update the kernel on our storage node
to match our backup server, just in case that helps.

~Alex

-----Original Message-----
From: EMC NetWorker discussion [mailto:NETWORKER AT LISTSERV.TEMPLE DOT EDU] On
Behalf Of Rachel Polanskis
Sent: Tuesday, June 10, 2008 8:27 AM
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Subject: Re: [Networker] tape drives getting "stuck" on storage node

On Tue, 10 Jun 2008, Hans de Vries wrote:

> On one of our sites we have had a lot of similar problems running
NetWorker
> 7.3.3 with a STL L700 library using HP LTO3 tapedrives. The same type
of
> library with IBM LTO2 drives did not cause any trouble.
> Several times a week tapes were stuck in the drives (sometimes only
one,
> sometimes more). Eject was not possible without powering off the
tapedrive.
> In our opinion this should be a hardware problem, and maybe it is. But
> after updating the server and storagenodes to Networker 7.4 SP2 the
> problems have disappeared.

We are on 7.2.1 and we get a "stuck" volume about once a month,
using LTO-2 FC drives.

I have a procedure that appears to work for me to release the volume 
in most cases, in the rest, I have to go and physcially reset the drive
to eject the media.   Also, I note it happens when the L700 or legato is

already quite busy.   We do not have fast load enabled, but I thought 
I might actually try it as it does all the hardware ops and then
resolves the Legato commands.

Here is my procedure:

You may have 2 faults - a "DRIVE_STATUS_INIT" or just a plain 
old unmountable volume.

If you have DRIVE_STATUS_INIT, do the following:

1) disable the device that has the issue.  Set the drive to 
Service Mode.

2) Wait about 5 minutes.  After about 5 minutes of blocking,
the tape volume should appear to stop rewinding or attempting dismount
and the drive commands will quiesce.

3) Re-enable the device.   Wait about 5 more minutes. 
If the device ejects/dismounts correctly, then you're done.

4) Otherwise the device may become stuck again.   Just repeat steps 
1-3.   If the server is very busy, wait until all nsrjb commands
complete before doing anything.

5) If you are still stuck, instead of re-enabling the Device, 
change the status from Service Mode to Disabled (No).

6) This will kill the associated nsrmmd process.   Login to the
Storage  node and manually (Solaris) run mt -f rewoffl on the effected
device.  It may report no volume in the drive.

7) After this step, re-enable the device.  This will restart nsrmmd on 
the device and it should dismount normally.   You may want to 
help it along with nsrjb -u.

After all this, it may not still dismount.   Usually it works for me 
the first or second time.   But the server can't be too busy.

If you have a volume that gets out of synch and doesn't want to 
dismount,  try steps 5-7 as the device is probably quiesced but 
the media index might be messed up.  In this case, I also try 
to run nsrjb -IE, or nsrjb -HH with the device disabled.

I have noted stuck vols most often when running nsrjb commands
such as label or inventory when the system is quite busy.

We got a case a few weeks ago where volid 105155 got "stuck" and 
I had to physically reset it.  About 2 weeks later, 105155 got 
stuck again and I actually got StorageTek to come and remove it 
for me as I could do nothing at all.   The tech decided to replace 
the drive.   105155 indeed had chewing on the edge of one side of the 
tape and it was obvious the tape or drive were out of alignment.

So, not in all cases will the above work, but it's saved me a bit
when I really need that drive running.

Looking forward to our move to 7.4.x now....


rachel

-- 
Rachel Polanskis                Systems Admin, University of Western
Sydney
ADD Werrington North Campus     (+61 2) 9678 7291
<r.polanskis AT uws.edu DOT au>
                The price of greatness is responsibility.

To sign off this list, send email to listserv AT listserv.temple DOT edu and
type "signoff networker" in the body of the email. Please write to
networker-request AT listserv.temple DOT edu if you have any problems with this
list. You can access the archives at
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

To sign off this list, send email to listserv AT listserv.temple DOT edu and 
type "signoff networker" in the body of the email. Please write to 
networker-request AT listserv.temple DOT edu if you have any problems with this 
list. You can access the archives at 
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER