On Tue, 10 Jun 2008, Hans de Vries wrote:
On one of our sites we have had a lot of similar problems running NetWorker
7.3.3 with a STL L700 library using HP LTO3 tapedrives. The same type of
library with IBM LTO2 drives did not cause any trouble.
Several times a week tapes were stuck in the drives (sometimes only one,
sometimes more). Eject was not possible without powering off the tapedrive.
In our opinion this should be a hardware problem, and maybe it is. But
after updating the server and storagenodes to Networker 7.4 SP2 the
problems have disappeared.
We are on 7.2.1 and we get a "stuck" volume about once a month,
using LTO-2 FC drives.
I have a procedure that appears to work for me to release the volume
in most cases, in the rest, I have to go and physcially reset the drive
to eject the media. Also, I note it happens when the L700 or legato is
already quite busy. We do not have fast load enabled, but I thought
I might actually try it as it does all the hardware ops and then
resolves the Legato commands.
Here is my procedure:
You may have 2 faults - a "DRIVE_STATUS_INIT" or just a plain
old unmountable volume.
If you have DRIVE_STATUS_INIT, do the following:
1) disable the device that has the issue. Set the drive to
Service Mode.
2) Wait about 5 minutes. After about 5 minutes of blocking,
the tape volume should appear to stop rewinding or attempting dismount
and the drive commands will quiesce.
3) Re-enable the device. Wait about 5 more minutes.
If the device ejects/dismounts correctly, then you're done.
4) Otherwise the device may become stuck again. Just repeat steps
1-3. If the server is very busy, wait until all nsrjb commands
complete before doing anything.
5) If you are still stuck, instead of re-enabling the Device,
change the status from Service Mode to Disabled (No).
6) This will kill the associated nsrmmd process. Login to the
Storage node and manually (Solaris) run mt -f rewoffl on the effected
device. It may report no volume in the drive.
7) After this step, re-enable the device. This will restart nsrmmd on
the device and it should dismount normally. You may want to
help it along with nsrjb -u.
After all this, it may not still dismount. Usually it works for me
the first or second time. But the server can't be too busy.
If you have a volume that gets out of synch and doesn't want to
dismount, try steps 5-7 as the device is probably quiesced but
the media index might be messed up. In this case, I also try
to run nsrjb -IE, or nsrjb -HH with the device disabled.
I have noted stuck vols most often when running nsrjb commands
such as label or inventory when the system is quite busy.
We got a case a few weeks ago where volid 105155 got "stuck" and
I had to physically reset it. About 2 weeks later, 105155 got
stuck again and I actually got StorageTek to come and remove it
for me as I could do nothing at all. The tech decided to replace
the drive. 105155 indeed had chewing on the edge of one side of the
tape and it was obvious the tape or drive were out of alignment.
So, not in all cases will the above work, but it's saved me a bit
when I really need that drive running.
Looking forward to our move to 7.4.x now....
rachel
--
Rachel Polanskis Systems Admin, University of Western Sydney
ADD Werrington North Campus (+61 2) 9678 7291 <r.polanskis AT uws.edu DOT
au>
The price of greatness is responsibility.
To sign off this list, send email to listserv AT listserv.temple DOT edu and type
"signoff networker" in the body of the email. Please write to networker-request
AT listserv.temple DOT edu if you have any problems with this list. You can access the
archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER
|