Networker

Re: [Networker] tape drives getting "stuck" on storage node

2008-06-10 08:36:57
Subject: Re: [Networker] tape drives getting "stuck" on storage node
From: Rachel Polanskis <r.polanskis AT UWS.EDU DOT AU>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Tue, 10 Jun 2008 22:26:43 +1000
On Tue, 10 Jun 2008, Hans de Vries wrote:

On one of our sites we have had a lot of similar problems running NetWorker
7.3.3 with a STL L700 library using HP LTO3 tapedrives. The same type of
library with IBM LTO2 drives did not cause any trouble.
Several times a week tapes were stuck in the drives (sometimes only one,
sometimes more). Eject was not possible without powering off the tapedrive.
In our opinion this should be a hardware problem, and maybe it is. But
after updating the server and storagenodes to Networker 7.4 SP2 the
problems have disappeared.

We are on 7.2.1 and we get a "stuck" volume about once a month,
using LTO-2 FC drives.

I have a procedure that appears to work for me to release the volume in most cases, in the rest, I have to go and physcially reset the drive to eject the media. Also, I note it happens when the L700 or legato is already quite busy. We do not have fast load enabled, but I thought I might actually try it as it does all the hardware ops and then
resolves the Legato commands.

Here is my procedure:

You may have 2 faults - a "DRIVE_STATUS_INIT" or just a plain old unmountable volume.

If you have DRIVE_STATUS_INIT, do the following:

1) disable the device that has the issue. Set the drive to Service Mode.

2) Wait about 5 minutes.  After about 5 minutes of blocking,
the tape volume should appear to stop rewinding or attempting dismount
and the drive commands will quiesce.

3) Re-enable the device. Wait about 5 more minutes. If the device ejects/dismounts correctly, then you're done.

4) Otherwise the device may become stuck again. Just repeat steps 1-3. If the server is very busy, wait until all nsrjb commands
complete before doing anything.

5) If you are still stuck, instead of re-enabling the Device, change the status from Service Mode to Disabled (No).

6) This will kill the associated nsrmmd process.   Login to the
Storage  node and manually (Solaris) run mt -f rewoffl on the effected
device.  It may report no volume in the drive.

7) After this step, re-enable the device. This will restart nsrmmd on the device and it should dismount normally. You may want to help it along with nsrjb -u.

After all this, it may not still dismount. Usually it works for me the first or second time. But the server can't be too busy.

If you have a volume that gets out of synch and doesn't want to dismount, try steps 5-7 as the device is probably quiesced but the media index might be messed up. In this case, I also try to run nsrjb -IE, or nsrjb -HH with the device disabled.

I have noted stuck vols most often when running nsrjb commands
such as label or inventory when the system is quite busy.

We got a case a few weeks ago where volid 105155 got "stuck" and I had to physically reset it. About 2 weeks later, 105155 got stuck again and I actually got StorageTek to come and remove it for me as I could do nothing at all. The tech decided to replace the drive. 105155 indeed had chewing on the edge of one side of the tape and it was obvious the tape or drive were out of alignment.

So, not in all cases will the above work, but it's saved me a bit
when I really need that drive running.

Looking forward to our move to 7.4.x now....


rachel

--
Rachel Polanskis                Systems Admin, University of Western Sydney
ADD Werrington North Campus     (+61 2) 9678 7291  <r.polanskis AT uws.edu DOT 
au>
                The price of greatness is responsibility.

To sign off this list, send email to listserv AT listserv.temple DOT edu and type 
"signoff networker" in the body of the email. Please write to networker-request 
AT listserv.temple DOT edu if you have any problems with this list. You can access the 
archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER