ADSM-L

[ADSM-L] Inconsisent behavior of mount errors on scratch tapes.

2009-02-27 03:03:47
Subject: [ADSM-L] Inconsisent behavior of mount errors on scratch tapes.
From: Steven Harris <sjharris AT AU1.IBM DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Fri, 27 Feb 2009 18:44:29 +1100
Hi Gang

I've had two cases of a bad scratch tape causing me issues this week, but
the behavior of TSM was different in each case and to my mind inconsistent.

Case 1.

Full backup of a TDP for exchange node.  TSM 5.4.4.0 for Windows Server,
TSM client 5.3.6, Storage agent 5.3.6, TDP for Exchange 5.3.3.1  TS3100
library with LTO3 drives.

LAN Free TDP backup is part way through and goes to mount the next tape.
Tape mount fails...

02/26/2009 04:16:48  ANR8944E Hardware or media error on drive DRIVE1
                      (\\.\Tape2) with volume DIA142L3(OP=TESTREADY, Error
                      Number= 23, CC=0, KEY=03, ASC=53, ASCQ=00,

SENSE=70.00.03.00.00.00.00.58.00.00.00.00.53.00.36.00.2E-

.07.00.02.00.02.20.20.20.20.20.20.20.00.00.00.24.98.01.7-

4.00.00.00.00.00.00.00.00.00.00.00.00.00.00.12.02.00.00.-

00.00.00.00.00.60.00.00.00.00.70.00.03.00.00.00.00.58.00-
                      .00.00.00.53.00.36.00.2
E.07.00.02.00.02.20.20.20.20.20.2-
                      0.00.00.00, Description=An undetermined error has
                      occurred). Refer to Appendix C in the 'Messages'
manual
                      for recommended action. (SESSION: 995)
02/26/2009 04:16:48  ANR8304E Time out error on drive DRIVE1 (\\.\Tape2) in
                      library ATL. (SESSION: 995)
02/26/2009 04:16:48  ANR8945W Scratch volume mount failed DIA142L3.
(SESSION:
                      995)
02/26/2009 04:17:17  ANR8381E LTO volume DIA142L3 could not be mounted in
drive
                      DRIVE1 (\\.\Tape2). (SESSION: 995)
02/26/2009 04:17:17  ANR9790W Request to mount volume *SCRATCH* for library

This is the only  scratch tape in the library and is physically damaged.
The TDP aborts the transaction, and retries the backup which writes to the
end part of the first tape until it is full, at which point it tries to
mount the scratch again, gets the same error, and the cycle repeats.

Case 2.


Library Manager/Library client set up, Multiple P595 AIX LPARS. One TSM
Server is set up as Library manager and Config Manager. 4 library client
servers, all at TSM Server 5.5.1.0 .  Big TS3500 library, 30 drives and
2500 LTO4 tapes.  This site is ramping up and  has about 2000 scratch
tapes.

For reasons that we haven't quite understood yet, tapes are being left in
drives and not properly dismounted.  Thats not the interesting part.
A library client tries to mount a scratch on a drive that is unable to do
so, this produces an immediate IO error.  In this case the scratch that had
the IO error is marked private.  TSM assumes the problem is the *tape* and
attempts to mount the next available scratch in the same drive.  Again this
gets the IO error and is  marked private.  In a minute or two the server
has run through all 2000 scratches and we have none left

02/24/2009 03:01:22  ANR8300E I/O error on library LTOCV1 (OP=00006C03,
CC=207,
                      KEY=05, ASC=21, ASCQ=01, SENSE=70.00.05.00.00.00.00.0
A.0-
                      0.00.00.00.21.01.00.C0.00.06., Description=Device is
not
                      in a state capable of performing request).  Refer to
                      Appendix C in the 'Messages' manual for recommended
                      action. (SESSION: 7658)
more...   (<ENTER> to continue, 'C' to cancel)

02/24/2009 03:01:22  ANR8779E Unable to open drive , error number=2.
(SESSION:
                      7658)
02/24/2009 03:01:22  ANR8300E I/O error on library LTOCV1 (OP=00006C03,
CC=207,
                      KEY=05, ASC=21, ASCQ=01, SENSE=70.00.05.00.00.00.00.0
A.0-
                      0.00.00.00.21.01.00.C0.00.04., Description=Device is
not
                      in a state capable of performing request).  Refer to
                      Appendix C in the 'Messages' manual for recommended
                      action. (SESSION: 7658)
02/24/2009 03:01:22  ANR8778W Scratch volume CV0288L4 changed to Private
Status
                      to prevent re-access. (SESSION: 7658)
02/24/2009 03:01:22  ANR8942E Could not move volume CV0288L4 from
slot-element
                      1356 to slot-element 65535. (SESSION: 7658)
02/24/2009 03:01:22  ANR8381E LTO volume CV0288L4 could not be mounted in
drive
                      . (SESSION: 7658)
02/24/2009 03:01:22  ANR9790W Request to mount volume *SCRATCH* for library
                      client CV01 failed. (SESSION: 7658)


Questions.

Why did the first case not mark the tape Private?  Why did the second case
retry the mount repeatedly on the same drive rather than moving on to the
next one?

Does anyone understand this behavior?

Thanks

Steve

Steven Harris
TSM Admin, Sydney Australia

<Prev in Thread] Current Thread [Next in Thread>