We have 2 3582 libraries and sometimes get the same message. In our case,
we learned from IBM that both libraries need there own channel or
controller. We are using one channel with both libraries daisy chained
together. We were told by IBM that this was the cause of our errors. You
didn't say what type of library or how they (it) may be connected. But
when the error occurs, it takes 10 minutes before the tape will actually
dismount from the drive. TSM waits the entire time. If I manually
dismount the tape during this time, TSM will mark the tape as readonly.
This only occurs during reclamation and only when it is running on both
libraries at the same time. We avoid the problem by running it on one
library at a time.
This is in our case, may not be the same setup for you. We plan to add a
third library and will fix the daisy chain issue later.
Buddy Howeth
Computer Operations Specialist
Information Systems
Pacific Coast Producers
Corporate Offices
631 N. Cluff Ave
Lodi, CA 95240-0756
(209) 367-8800 - Main#
(209) 367-6288 - Computer Room
(209) 366-6240 - Alpha Pager
Henrik Vahlstedt <SHWL AT STATOILHYDRO DOT COM>
Sent by: "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>
07/21/2009 06:37 AM
Please respond to
"ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>
To
ADSM-L AT VM.MARIST DOT EDU
cc
Subject
[ADSM-L] FW: Errno = 23, 1167 and stuck tapes
Hello,
Have anyone experience with Errno = 23, 1167 and stuck tapes or a
suggestion what might cause the errors and how to solve them?
I get the errors in all kind of datamovement processes, D2T, T2T. Errors
are on all drives randomly but not all the time.
That is, I can have 1 or 30 mounts before I get an error on a drive and
the faulty tape mounts OK in another drive.
OS-, switch- and TSM logs etc does not provide any helpfull information.
W2k3 x64 sp2, TSM 5.5.2.1
5 LTO-4, SL500, 3 dual channel HBA´s connected to the SAN with one device
per channel.
Lastest firmware drivers etc
First, err=1167, space reclamation mounts a tape and the drive disappear,
why? However after some minutes TSM resurrect the drive and
continue to use it in new processes.
07/20/2009 05:53:21 ANR8337I LTO volume 4R0245 mounted in drive MT504
(mt0.0.0.2). (PROCESS: 59)
07/20/2009 05:53:42 ANR8311E An I/O error occurred while accessing
drive MT504
(mt0.0.0.2) for WEOF operation, errno = 1167.
(PROCESS:
59)
07/20/2009 05:53:42 ANR8311E An I/O error occurred while accessing
drive MT504
(mt0.0.0.2) for OFFL operation, errno = 1167.
(PROCESS:
59)
07/20/2009 05:53:43 ANR8469E Dismount of LTO volume 4R0245 from drive
MT504
C:\>net helpmsg 1167
The device is not connected.
Event Type: Error
Event Source: PlugPlayManager
Event Category: None
Event ID: 12
Date: 7/20/2009
Time: 5:53:42 AM
User: N/A
Computer:
Description:
The device 'IBM ULTRIUM-TD4 SCSI Sequential Device'
(SCSI\Sequential&Ven_IBM&Prod_ULTRIUM-TD4&Rev_82F0\5&3652500d&0&000000)
disappeared from the system without first being prepared for removal.
For more information, see Help and Support Center at
http://go.microsoft.com/fwlink/events.asp.
Data:
0000: 00 00 00 00 ....
Volume Name: 4R0051
Storage Pool Name: LTO4-BCK
Device Class Name: LTO4
Estimated Capacity: 1.6 T
Scaled Capacity Applied:
Pct Util: 2.0
Volume Status: Filling
Access: Read-Only
Pct. Reclaimable Space: 0.1
Scratch Volume?: Yes
In Error State?: No
Number of Writable Sides: 1
Number of Times Mounted: 8
Write Pass Number: 1
Approx. Date Last Written: 07/20/2009 04:00:47
Approx. Date Last Read: 07/20/2009 21:26:28
Date Became Pending:
Number of Write Errors: 0
Number of Read Errors: 0
Volume Location:
Volume is MVS Lanfree Capable : No
Last Update by (administrator):
Last Update Date/Time: 07/20/2009 05:52:36
Begin Reclaim Period:
End Reclaim Period:
Drive Encryption Key Manager: None
Second, err=23, a write errors generates error=23 and the tape is stuck.
TSM nor Lbtest can remove the tape.
07/21/2009 02:23:29 ANR8337I LTO volume 4R0254 mounted in drive MT501
(mt0.0.0.5). (SESSION: 12497, PROCESS: 70)
07/21/2009 02:23:29 ANR1340I Scratch volume 4R0254 is now defined in
storage
pool LTO4-BCK. (SESSION: 12497, PROCESS: 70)
07/21/2009 02:23:33 ANR0513I Process 70 opened output volume 4R0254.
(SESSION:
12497, PROCESS: 70)
07/21/2009 02:24:16 ANR8944E Hardware or media error on drive MT501
(mt0.0.0.5) with volume 4R0254(OP=WRITE, Error
Number=
23, CC=0, KEY=03, ASC=52, ASCQ=00,
SENSE=71.00.03.00.00.00.00.58.00.00.00.00.52.00.36.00.78-
.D1.23.5D, Description=An undetermined error has
occurred). Refer to Appendix C in the 'Messages'
manual
for recommended action. (SESSION: 12497,
PROCESS: 70)
07/21/2009 02:24:16 ANR8359E Media fault detected on LTO volume
4R0254 in
drive MT501 (mt0.0.0.5) of library SL500.
(SESSION:
12497, PROCESS: 70)
07/21/2009 02:24:16 ANR1411W Access mode for volume 4R0254 now set to
"read-only" due to write error. (SESSION: 12497,
PROCESS:
70)
07/21/2009 02:24:16 ANR0515I Process 70 closed volume 4R0254.
(SESSION: 12497,
PROCESS: 70)
07/21/2009 02:24:37 ANR8944E Hardware or media error on drive MT501
(mt0.0.0.5) with volume 4R0254(OP=OFFL, Error
Number= 23,
CC=0, KEY=03, ASC=53, ASCQ=04,
SENSE=70.00.03.00.00.00.00.58.00.00.00.00.53.04.36.00.2E-
.05.10.06, Description=An undetermined error has
occurred). Refer to Appendix C in the 'Messages'
manual
for recommended action. (SESSION: 12497,
PROCESS: 70)
07/21/2009 02:24:37 ANR8950W Device mt0.0.0.5, volume 4R0254 has
issued the
following Warning TapeAlert: The operation has
stopped
because an error has occurred while reading or
writing
data which the drive cannot correct. (SESSION:
12497,
PROCESS: 70)
07/21/2009 02:24:37 ANR8948S Device mt0.0.0.5, volume 4R0254 has
issued the
following Critical TapeAlert: Your data is at
risk: 1.
Copy any data you require from this tape. 2. Do
not use
this tape again. 3. Restart the operation with
a
different tape. (SESSION: 12497, PROCESS: 70)
07/21/2009 02:24:37 ANR8949E Device mt0.0.0.5, volume 4R0254 has
issued the
following Critical TapeAlert: The tape drive has
a
hardware fault: 1. Eject the tape or magazine.
2. Reset
the drive. 3. Restart the operation. (SESSION:
12497,
PROCESS: 70)
07/21/2009 02:24:37 ANR8949E Device mt0.0.0.5, volume 4R0254 has
issued the
following Critical TapeAlert: The operation has
failed:
1. Eject the tape or magazine. 2. Restart the
operation.
(SESSION: 12497, PROCESS: 70)
07/21/2009 02:24:37 ANR8950W Device mt0.0.0.5, volume 4R0254 has
issued the
following Warning TapeAlert: The tape drive may
have a
hardware fault. Run extended diagnostics to
verify and
diagnose the problem. Check the tape drive
users manual
for device specific instruction on running
extended
diagnostic tests. (SESSION: 12497, PROCESS: 70)
07/21/2009 02:24:37 ANR8951I Device mt0.0.0.5, volume 4R0254 has
issued the
following Information TapeAlert: The device has
encountered TapeAlert 56. (SESSION: 12497,
PROCESS: 70)
07/21/2009 02:24:57 ANR8469E Dismount of LTO volume 4R0254 from drive
MT501
(mt0.0.0.5) in library SL500 failed. (SESSION:
12497,
PROCESS: 70)
C:\>net helpmsg 23
Data error (cyclic redundancy check).
Volume Name: 4R0254
Storage Pool Name: LTO4-BCK
Device Class Name: LTO4
Estimated Capacity: 1.6 T
Scaled Capacity Applied:
Pct Util: 0.1
Volume Status: Filling
Access: Read-Only
Pct. Reclaimable Space: 0.0
Scratch Volume?: Yes
In Error State?: Yes
Number of Writable Sides: 1
Number of Times Mounted: 1
Write Pass Number: 1
Approx. Date Last Written: 07/21/2009 02:23:47
Approx. Date Last Read: 07/21/2009 02:23:47
Date Became Pending:
Number of Write Errors: 1
Number of Read Errors: 0
Volume Location:
Volume is MVS Lanfree Capable : No
Last Update by (administrator):
Last Update Date/Time: 07/21/2009 02:23:29
Begin Reclaim Period:
End Reclaim Period:
Drive Encryption Key Manager: None
Event Type: Error
Event Source: tsmscsi
Event Category: None
Event ID: 3
Date: 7/21/2009
Time: 2:24:57 AM
User: N/A
Computer:
Description:
A check condition error has occured on device \Device\lb0.0.0.7 during
Move Medium with completion code DD_CHANGER_FAILURE. Refer to the device's
SCSI reference for appropriate action.
Dump Data: byte 0x3E=KEY, byte 0x3D=ASC, byte 0x3C=ASCQ
Data:
0000: 0e 00 18 00 03 00 6c 00 ......l.
0008: 00 00 00 00 03 00 00 e0 .......à
0010: 30 01 00 00 85 01 00 c0 0......À
0018: 00 00 00 00 58 c0 01 84 ....XÀ."
0020: 00 00 00 00 00 00 00 00 ........
0028: 00 00 00 00 02 c4 a5 00 .....Ä¥.
0030: 84 03 00 00 80 16 8c 0b "...?.O.
0038: 00 00 40 00 00 53 04 70 [email protected]
Event Type: Error
Event Source: tsmscsi
Event Category: None
Event ID: 3
Date: 7/21/2009
Time: 2:24:57 AM
User: N/A
Computer:
Description:
A check condition error has occured on device \Device\lb0.0.0.7 during
Move Medium with completion code DD_HARDWARE_MICROCODE. Refer to the
device's SCSI reference for appropriate action.
Dump Data: byte 0x3E=KEY, byte 0x3D=ASC, byte 0x3C=ASCQ
Data:
0000: 0e 00 18 00 03 00 6c 00 ......l.
0008: 00 00 00 00 03 00 00 e0 .......à
0010: d1 00 00 00 85 01 00 c0 Ñ......À
0018: 00 00 00 00 58 c0 01 84 ....XÀ."
0020: 00 00 00 00 00 00 00 00 ........
0028: 00 00 00 00 02 c4 a5 00 .....Ä¥.
0030: 84 03 00 00 80 16 8c 0b "...?.O.
0038: 00 00 40 00 00 44 04 70 [email protected]
Tia
Henrik
-------------------------------------------------------------------
The information contained in this message may be CONFIDENTIAL and is
intended for the addressee only. Any unauthorised use, dissemination of
the
information or copying of this message is prohibited. If you are not the
addressee, please notify the sender immediately by return e-mail and
delete
this message.
Thank you.
_____________________________________________________________________________
Scanned by IBM Email Security Management Services powered by MessageLabs.
For more information please visit http://www.ers.ibm.com
_____________________________________________________________________________
|