ADSM-L

Re: [ADSM-L] FW: Errno = 23, 1167 and stuck tapes

2009-07-21 09:47:39
Subject: Re: [ADSM-L] FW: Errno = 23, 1167 and stuck tapes
From: Buddy Howeth <BHoweth AT PCOASTP DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Tue, 21 Jul 2009 06:41:43 -0700
We have 2 3582 libraries and sometimes get the same message.  In our case, 
we learned from IBM that both libraries need there own channel or 
controller. We are using one channel with both libraries daisy chained 
together.  We were told by IBM that this was the cause of our errors.  You 
didn't say what type of library or how they (it) may be connected.  But 
when the error occurs, it takes 10 minutes before the tape will actually 
dismount from the drive.  TSM waits the entire time.  If I manually 
dismount the tape during this time, TSM will mark the tape as readonly. 
This only occurs during reclamation and only when it is running on both 
libraries at the same time.  We avoid the problem by running it on one 
library at a time.

This is in our case, may not be the same setup for you.  We plan to add a 
third library and will fix the daisy chain issue later.

Buddy Howeth
Computer Operations Specialist
Information Systems
Pacific Coast Producers
Corporate Offices
631 N. Cluff Ave
Lodi, CA  95240-0756
(209) 367-8800 - Main#
(209) 367-6288 - Computer Room
(209) 366-6240 - Alpha Pager





Henrik Vahlstedt <SHWL AT STATOILHYDRO DOT COM> 
Sent by: "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>
07/21/2009 06:37 AM
Please respond to
"ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>


To
ADSM-L AT VM.MARIST DOT EDU
cc

Subject
[ADSM-L] FW: Errno = 23, 1167 and stuck tapes






Hello,

Have anyone experience with Errno = 23, 1167 and stuck tapes or a 
suggestion what might cause the errors and how to solve them?
I get the errors in all kind of datamovement processes, D2T, T2T. Errors 
are on all drives randomly but not all the time.
That is, I can have 1 or 30 mounts before I get an error on a drive and 
the faulty tape mounts OK in another drive.
OS-, switch- and TSM logs etc does not provide any helpfull information.


W2k3 x64 sp2, TSM 5.5.2.1
5 LTO-4, SL500, 3 dual channel HBA´s connected to the SAN with one device 
per channel.
Lastest firmware drivers etc


First, err=1167, space reclamation mounts a tape and the drive disappear, 
why? However after some minutes TSM resurrect the drive and
continue to use it in new processes.
07/20/2009 05:53:21      ANR8337I LTO volume 4R0245 mounted in drive MT504
                          (mt0.0.0.2). (PROCESS: 59)
07/20/2009 05:53:42      ANR8311E An I/O error occurred while accessing 
drive MT504
                          (mt0.0.0.2) for WEOF operation, errno = 1167. 
(PROCESS:
                          59)
07/20/2009 05:53:42      ANR8311E An I/O error occurred while accessing 
drive MT504
                          (mt0.0.0.2) for OFFL operation, errno = 1167. 
(PROCESS:
                          59)
07/20/2009 05:53:43      ANR8469E Dismount of LTO volume 4R0245 from drive 
MT504


C:\>net helpmsg 1167
The device is not connected.


Event Type:     Error
Event Source:   PlugPlayManager
Event Category: None
Event ID:       12
Date:           7/20/2009
Time:           5:53:42 AM
User:           N/A
Computer:
Description:
The device 'IBM ULTRIUM-TD4 SCSI Sequential Device' 
(SCSI\Sequential&Ven_IBM&Prod_ULTRIUM-TD4&Rev_82F0\5&3652500d&0&000000) 
disappeared from the system without first being prepared for removal.
For more information, see Help and Support Center at 
http://go.microsoft.com/fwlink/events.asp.
Data:
0000: 00 00 00 00               ....


                   Volume Name: 4R0051
             Storage Pool Name: LTO4-BCK
             Device Class Name: LTO4
            Estimated Capacity: 1.6 T
       Scaled Capacity Applied:
                      Pct Util: 2.0
                 Volume Status: Filling
                        Access: Read-Only
        Pct. Reclaimable Space: 0.1
               Scratch Volume?: Yes
               In Error State?: No
      Number of Writable Sides: 1
       Number of Times Mounted: 8
             Write Pass Number: 1
     Approx. Date Last Written: 07/20/2009 04:00:47
        Approx. Date Last Read: 07/20/2009 21:26:28
           Date Became Pending:
        Number of Write Errors: 0
         Number of Read Errors: 0
               Volume Location:
Volume is MVS Lanfree Capable : No
Last Update by (administrator):
         Last Update Date/Time: 07/20/2009 05:52:36
          Begin Reclaim Period:
            End Reclaim Period:
  Drive Encryption Key Manager: None



Second, err=23, a write errors generates error=23 and the tape is stuck. 
TSM nor Lbtest can remove the tape.
07/21/2009 02:23:29      ANR8337I LTO volume 4R0254 mounted in drive MT501
                          (mt0.0.0.5). (SESSION: 12497, PROCESS: 70)
07/21/2009 02:23:29      ANR1340I Scratch volume 4R0254 is now defined in 
storage
                          pool LTO4-BCK. (SESSION: 12497, PROCESS: 70)
07/21/2009 02:23:33      ANR0513I Process 70 opened output volume 4R0254. 
(SESSION:
                          12497, PROCESS: 70)
07/21/2009 02:24:16      ANR8944E Hardware or media error on drive MT501
                          (mt0.0.0.5) with volume 4R0254(OP=WRITE, Error 
Number=
                          23, CC=0, KEY=03, ASC=52, ASCQ=00,
 SENSE=71.00.03.00.00.00.00.58.00.00.00.00.52.00.36.00.78-
                          .D1.23.5D, Description=An undetermined error has
                          occurred). Refer to Appendix C in the 'Messages' 
manual
                          for recommended action. (SESSION: 12497, 
PROCESS: 70)
07/21/2009 02:24:16      ANR8359E Media fault detected on LTO volume 
4R0254 in
                          drive MT501 (mt0.0.0.5) of library SL500. 
(SESSION:
                          12497, PROCESS: 70)
07/21/2009 02:24:16      ANR1411W Access mode for volume 4R0254 now set to
                          "read-only" due to write error. (SESSION: 12497, 
PROCESS:
                          70)
07/21/2009 02:24:16      ANR0515I Process 70 closed volume 4R0254. 
(SESSION: 12497,
                          PROCESS: 70)
07/21/2009 02:24:37      ANR8944E Hardware or media error on drive MT501
                          (mt0.0.0.5) with volume 4R0254(OP=OFFL, Error 
Number= 23,
                          CC=0, KEY=03, ASC=53, ASCQ=04,
 SENSE=70.00.03.00.00.00.00.58.00.00.00.00.53.04.36.00.2E-
                          .05.10.06, Description=An undetermined error has
                          occurred). Refer to Appendix C in the 'Messages' 
manual
                          for recommended action. (SESSION: 12497, 
PROCESS: 70)
07/21/2009 02:24:37      ANR8950W Device mt0.0.0.5, volume 4R0254 has 
issued the
                          following Warning TapeAlert: The operation has 
stopped
                          because an error has occurred while reading or 
writing
                          data which the drive cannot correct. (SESSION: 
12497,
                          PROCESS: 70)
07/21/2009 02:24:37      ANR8948S Device mt0.0.0.5, volume 4R0254 has 
issued the
                          following Critical TapeAlert: Your data is at 
risk: 1.
                          Copy any data you require from this tape. 2. Do 
not use
                          this tape again.  3. Restart the operation with 
a
                     different tape. (SESSION: 12497, PROCESS: 70)
07/21/2009 02:24:37      ANR8949E Device mt0.0.0.5, volume 4R0254 has 
issued the
                          following Critical TapeAlert: The tape drive has 
a
                          hardware fault:  1. Eject the tape or magazine. 
2. Reset
                          the drive.  3. Restart the operation. (SESSION: 
12497,
                          PROCESS: 70)
07/21/2009 02:24:37      ANR8949E Device mt0.0.0.5, volume 4R0254 has 
issued the
                          following Critical TapeAlert: The operation has 
failed:
                          1. Eject the tape or magazine.  2. Restart the 
operation.
                          (SESSION: 12497, PROCESS: 70)
07/21/2009 02:24:37      ANR8950W Device mt0.0.0.5, volume 4R0254 has 
issued the
                          following Warning TapeAlert: The tape drive may 
have a
                          hardware fault.  Run extended diagnostics to 
verify and
                          diagnose the problem.  Check the tape drive 
users manual
                          for device specific instruction on running 
extended
                          diagnostic tests. (SESSION: 12497, PROCESS: 70)
07/21/2009 02:24:37      ANR8951I Device mt0.0.0.5, volume 4R0254 has 
issued the
                          following Information TapeAlert: The device has
                          encountered TapeAlert 56. (SESSION: 12497, 
PROCESS: 70)
07/21/2009 02:24:57      ANR8469E Dismount of LTO volume 4R0254 from drive 
MT501
                          (mt0.0.0.5) in library SL500 failed. (SESSION: 
12497,
                          PROCESS: 70)


C:\>net helpmsg 23
Data error (cyclic redundancy check).


                   Volume Name: 4R0254
             Storage Pool Name: LTO4-BCK
             Device Class Name: LTO4
            Estimated Capacity: 1.6 T
       Scaled Capacity Applied:
                      Pct Util: 0.1
                 Volume Status: Filling
                        Access: Read-Only
        Pct. Reclaimable Space: 0.0
               Scratch Volume?: Yes
               In Error State?: Yes
      Number of Writable Sides: 1
       Number of Times Mounted: 1
             Write Pass Number: 1
     Approx. Date Last Written: 07/21/2009 02:23:47
        Approx. Date Last Read: 07/21/2009 02:23:47
           Date Became Pending:
        Number of Write Errors: 1
         Number of Read Errors: 0
               Volume Location:
Volume is MVS Lanfree Capable : No
Last Update by (administrator):
         Last Update Date/Time: 07/21/2009 02:23:29
          Begin Reclaim Period:
            End Reclaim Period:
  Drive Encryption Key Manager: None



Event Type:     Error
Event Source:   tsmscsi
Event Category: None
Event ID:       3
Date:           7/21/2009
Time:           2:24:57 AM
User:           N/A
Computer:
Description:
A check condition error has occured on device \Device\lb0.0.0.7 during 
Move Medium with completion code DD_CHANGER_FAILURE. Refer to the device's 
SCSI reference for appropriate action.

 Dump Data: byte 0x3E=KEY, byte 0x3D=ASC, byte 0x3C=ASCQ
Data:
0000: 0e 00 18 00 03 00 6c 00   ......l.
0008: 00 00 00 00 03 00 00 e0   .......à
0010: 30 01 00 00 85 01 00 c0   0......À
0018: 00 00 00 00 58 c0 01 84   ....XÀ."
0020: 00 00 00 00 00 00 00 00   ........
0028: 00 00 00 00 02 c4 a5 00   .....Ä¥.
0030: 84 03 00 00 80 16 8c 0b   "...?.O.
0038: 00 00 40 00 00 53 04 70   [email protected]

Event Type:     Error
Event Source:   tsmscsi
Event Category: None
Event ID:       3
Date:           7/21/2009
Time:           2:24:57 AM
User:           N/A
Computer:
Description:
A check condition error has occured on device \Device\lb0.0.0.7 during 
Move Medium with completion code DD_HARDWARE_MICROCODE. Refer to the 
device's SCSI reference for appropriate action.

 Dump Data: byte 0x3E=KEY, byte 0x3D=ASC, byte 0x3C=ASCQ
Data:
0000: 0e 00 18 00 03 00 6c 00   ......l.
0008: 00 00 00 00 03 00 00 e0   .......à
0010: d1 00 00 00 85 01 00 c0   Ñ......À
0018: 00 00 00 00 58 c0 01 84   ....XÀ."
0020: 00 00 00 00 00 00 00 00   ........
0028: 00 00 00 00 02 c4 a5 00   .....Ä¥.
0030: 84 03 00 00 80 16 8c 0b   "...?.O.
0038: 00 00 40 00 00 44 04 70   [email protected]


Tia
Henrik





-------------------------------------------------------------------
The information contained in this message may be CONFIDENTIAL and is
intended for the addressee only. Any unauthorised use, dissemination of 
the
information or copying of this message is prohibited. If you are not the
addressee, please notify the sender immediately by return e-mail and 
delete
this message.
Thank you.

_____________________________________________________________________________
Scanned by IBM Email Security Management Services powered by MessageLabs. 
For more information please visit http://www.ers.ibm.com
_____________________________________________________________________________

<Prev in Thread] Current Thread [Next in Thread>