ADSM-L

Re: AIT drive I/O problems

2004-06-25 15:10:03
Subject: Re: AIT drive I/O problems
From: Robert R Price <rprice28 AT CSC DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Fri, 25 Jun 2004 15:09:50 -0400
Same story here with AIT-3 drives.  I see lots of read errors that usually
go away on a retry.  See a good number of  write errors as well.  Some of
these are repeatable on any drive  and I chalk that up to defective media.
We clean the drives twice a week which is twice as often as recommended by
Sony.

I also see the problem with drives hanging up with dismount failures after
an ASC/ASQ=44/00.  Drive shows all three lights blinking.  Need power cycle
to clear it.  Good to hear that Sony is looking for a firmware fix.

Robert R. Price
ADSM/TSM Administrator
Computer Sciences Corporation
Phone: 412-374-3247
Fax: 412-374-6371
rprice28 AT csc DOT com


----------------------------------------------------------------------------------------

This is a PRIVATE message. If you are not the intended recipient, please
delete without copying and kindly advise us by e-mail of the mistake in
delivery. NOTE: Regardless of content, this e-mail shall not operate to
bind CSC to any order or other contract unless pursuant to explicit written
agreement or government initiative expressly permitting the use of e-mail
for such purpose.
----------------------------------------------------------------------------------------





                      "Riley, Craig"
                      <Riley.Craig             To:      ADSM-L AT VM.MARIST DOT 
EDU
                      @TCHDEN.ORG>             cc:
                      Sent by: "ADSM:          Subject: Re: AIT drive I/O 
problems
                      Dist Stor
                      Manager" <ADSM-L


                      06/25/04 01:26
                      PM
                      Please respond
                      to "ADSM: Dist
                      Stor Manager"






We have been seeing something similar with out AIT3 drives running in a
Spectralogic 64K. Also almost exclusivly during reclamation processing .
The difference with our situation is in addition to read errors
occationally the volume will fail to unmout and remain stuck in the drive .
We then have to power cycle the drive to eject the tape. running scsi
traces on the drives when this issue comes up we have found that the event
is always preceded by a scsi forward command asking the drive firmware to
move the tape position forward by some increment. Sony has identified this
as a firmware problem and is testing a fixed version of code right now.

Craig Riley
The Children's Hospital in Denver


-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU]On Behalf Of
Steven Bridge
Sent: Friday, June 25, 2004 8:02 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: AIT drive I/O problems


Currently running TSM server version 5.1.8 on an AIX 5.2 machine.
We have a Qualstar TLS-412600 library with three AIT-2 drives.

We are having continual problems with I/O errors almost exclusively
during the reclamation of tapes. Most often 98-99% of the reclaim works
but we are seeing with perhaps quarter to a half of our reclaims a
number of read errors ( < 100 ). In almost all cases, when we then
perform a 'move data' on the errant tape, it reads the remaining data
off without any problems. Errors have been seen on two of the drives
over the past month - but I'm not sure about the relative frequency
of use of all 3 drives to determine whether the other drive is error
free or just lucky.

What is most frustrating about this problem, is that a drive
experiencing read errors then hangs. The reclaim process is cancelled
when the volume has no reads 'logged' for some time - but the process
usually takes between 4 to 12 hours to stop - presumably waiting on some
I/O timeout. The drive can be observed performing some activity during
this time - continual retries perhaps ? If we can't wait for 12 hours
for the drives to be available again, the whole AIX box has to be
reloaded to clear the situation.

Drives have been replaced following tape jams but the replacement
drives still exhibit the same problems.

We have set the drives up with a cleaning frequency of 1000 GB - so
they are being cleaned every now and then.

I would be interested to hear if anyone else has experienced the
same problems with these drives - assuming anyone else uses AIT drives.
I wonder whether the problem is symptomatic with these drives or
whether there are any firmware upgrades that might fix the problem.
How do you find out what version is on the drive ?

I would also be very interested in any suggestions for preventing the
interminable hangs. Is there anywhere that this timeout can be reduced ?

Examples of errors logged :

2004-06-24 15:12:14 ANR8302E I/O error on drive DRIVE0 (/dev/mt0) (OP=READ,
Error Number=7
8, CC=205, KEY=FF, ASC=FF, ASCQ=FF, SENSE=**NONE**, Description=SCSI
adapter failure). Ref
er to Appendix D in the 'Messages' manual for recommended action.

then eventually when the cancel process completes ;

2004-06-24 23:17:49 ANR8302E I/O error on drive DRIVE0 (/dev/mt0) (OP=FSR,
Error Number=78
, CC=205, KEY=FF, ASC=FF, ASCQ=FF, SENSE=**NONE**, Description=SCSI adapter
failure). Refe
r to Appendix D in the 'Messages' manual for recommended action.

+----------------------------------------------------------------------+
 Steven Bridge     Systems Group, Information Systems, EISD
                          University College London


DISCLAIMER:
CONFIDENTIALITY NOTICE:  The information contained in this message is
legally privileged and confidential information intended for the use of the
individual or entity named above. If the reader of this message is not the
intended recipient, or the employee or agent responsible to deliver it to
the intended recipient, you are hereby notified that any release,
dissemination, distribution, or copying of this communication is strictly
prohibited.  If you have received this communication in error, please
notify the author immediately by replying to this message and delete the
original message. Thank you.

<Prev in Thread] Current Thread [Next in Thread>