ADSM-L

Re: HELP NEEDED: I/O ERRORS WITH DLT IV-TAPES, AFTER UPDATING ADS M 3 .1.2.1 to 3.1.2.40

1999-10-19 10:52:48
Subject: Re: HELP NEEDED: I/O ERRORS WITH DLT IV-TAPES, AFTER UPDATING ADS M 3 .1.2.1 to 3.1.2.40
From: "Prather, Wanda" <PrathW1 AT CENTRAL.SSD.JHUAPL DOT EDU>
Date: Tue, 19 Oct 1999 10:52:48 -0400
We run STK9710's (very similar to 9740) with DLT7000's, under AIX instead of
Solaris.
We are ADSM 3.1.2.20 on AIX 4.3.2.

This problem probably has nothing to do with your upgrade to 3.1.2.40.
I see these all the time, and with different levels of ADSM, even up through
the V85 drive microcode.

USUALLY (not always, but USUALLY), the WRITE error with CC=306 is a cleaning
error or drive error.
This can happen if your cleaning tape is used up.
The drive doesn't get cleaned, so it fails during the next write.
Then it dismounts the tape, mounts another, and fails during a write on that
tape, too.
It can also happen when a drive is going bad, and no amount of cleaning will
fix it.

Unfortunately, ADSM doesn't handle it very well.
Some of the data may actually be missing because of that bad WRITE.
SO when you try to read the tapes back on a different drive, you get I/O
errors reading, also.
So a 306 WRITE error, will generate READ errors on that tape later on.

And the next thing you know, you have a LOT of bad tapes in your library,
and you can't tell where they are coming from, even though they were
actually due to a problem with just one drive.

First thing you should do is to make sure your drives are getting cleaned.
Make sure cleaning is enabled, make sure your operators know when to change
the cleaning tape.
Change the cleaning tape, just to be sure.

Also, if your 9740 has a glass door and you can see the drives, do a visual
check several times a day to see if any drives have the cleaning light on
(it's the next to bottom light on the right; turns yellow when a clean is
required.)   In general, you should not see cleaning lights on except when
there is a BAD tape in the drive.

Next thing you should do is check the activity log, find all the tapes that
got a 306 error on WRITE, and MOVE DATA to another tape.  If all the data
moves, fine.  But you may find there is some data on the tape that won't
read back because of that failed write.  You need to find out ASAP what is
missing from your backups, and clean up the mess.

AFter that, if you continue to get 306 WRITE errors, on different tapes but
the same drive, have STK replace the drive.

Hope this helps,,
************************************************************************
Wanda Prather
The Johns Hopkins Applied Physics Lab
443-778-8769
wanda_prather AT jhuapl DOT edu

"Intelligence has much less practical application than you'd think" -
Scott Adams/Dilbert
************************************************************************







> -----Original Message-----
> From: Pentti =?UNKNOWN?Q?H=80rk=BEnen?= [SMTP:pentti.x.harkonen AT NOKIA DOT 
> COM]
> Sent: Tuesday, October 19, 1999 2:33 AM
> To:   ADSM-L AT VM.MARIST DOT EDU
> Subject:      HELP NEEDED: I/O ERRORS WITH DLT IV-TAPES, AFTER UPDATING
> ADSM 3 .1.2.1 to 3.1.2.40
>
> Hello all!
>
>  After updating ADSM 3.1.2.1 to 3.1.2.40 we have had
>  several I/0 -errors with DLT IV-tapes. This will change tapes to readonly
> or unavailable access.  Now we have almost 5 per cent unavailable tapes
> from
> all of our tapes.
>
>  Our configuration:
>  Sun UE5000 + A5000 + Solaris 2.6 + STK9740 library with
>  10 x DLT7K tapedrives.
>
>  Have anyone had same problems? Do you know what are the right microcode
> levels to those tapedrives? How can we correct the problem?
>
>  Here are few messages from activity-log:
>
>  ANR0986I Process 19 for MOVE DATA running
>  in the BACKGROUND processed 266 items for
>  a total of 6,888,398,513 bytes with a
>  completion state of FAILURE at 10:38:40.
>  ANR8302E I/O error on drive DRV2
>  (/dev/rmt/9mt) (OP=LOCATE, CC=306, KEY=03,
>  ASC=11, ASCQ=00,
>  SENSE=F0.00.03.00.00.00.63.16.00.00.E8.3B-
>  .11.00.00.00.00.00.85.01.00.00.00.00.00.0-
>  0.00.00.00.,
>  Description=Drive or media failure).
>  Refer to Appendix B in the 'Messages'
>  manual for recommended action.
>  ANR8359E Media fault detected on DLT
>  volume 001005 in drive DRV2 (/dev/rmt/9mt)
>  of library STK9740.
>  ANR8302E I/O error on drive DRV0
>  (/dev/rmt/1mt) (OP=WRITE, CC=306, KEY=03,
>  ASC=0C, ASCQ=00,
>  SENSE=F1.00.03.00.00.00.00.16.00.00.29.6B-
>  .0C.00.00.00.00.00.80.06.00.00.00.00.00.0-
>  0.00.00.00.,
>  Description=Drive or media failure).
>  Refer to Appendix B in the 'Messages'
>  manual for recommended action.
>  ANR8359E Media fault detected on DLT
>  volume 001353 in drive DRV0 (/dev/rmt/1mt)
>  of library STK9740.
>
>  Any help would be greatly appreciated.
>
> == Pentti ==
> ---------------------------------------------------------------
>  Pentti Hdrkvnen
>  Email. Pentti.X.Harkonen AT nokia DOT com
> ---------------------------------------------------------------
<Prev in Thread] Current Thread [Next in Thread>