Neo 4100 drive problems

Hi,

TSM 5.1.9.0 on win2K server. Library is Neo 4100 with 2 x LTO-2 drives
(HP) and 60 tapes.

I *think* I have a drive failure, but I'm not sure because the errors are
so varied and intermittent. Here is what happens.

1) 16 of my tapes got set to unavailable when the server was unable to
read the labels on the tapes: 06/07/2005 10:49:22   ANR8355E I/O error
reading label for volume ITG051L2 in drive MT1.0.0.3 (mt1.0.0.3).
All 16 tapes got marked within a 3 hour period, and all of them
failed in drive MT1.0.0.3. Through my fault I didn't realize the tapes
were being set to unavailable until much later. That problem is now fixed
as my reporting script now tells me how many tapes are marked unavailable.

2) A few days ago, I started noticing more errors in the logs related to
drive/tape/scsi errors. Errors such as:

06/21/2005 08:44:19   ANR8300E I/O error on library LB6.0.0.3
(OP=8401C058, CC=205, KEY=FF, ASC=FF, ASCQ=FF, SENSE=**NONE**,
Description=SCSI adapter failure).  Refer to Appendix D in the 'Messages'
manual for recommended action.

06/21/2005 09:10:32   ANR8300E I/O error on library LB6.0.0.3
(OP=8401C058, CC=211, KEY=FF, ASC=FF, ASCQ=FF, SENSE=**NONE**,
Description=The SCSI bus was busy).  Refer to Appendix D in the 'Messages'
manual for recommended action.

06/22/2005 16:02:54   ANR8302E I/O error on drive MT1.0.0.3 (mt1.0.0.3)
(OP=TESTREADY, Error Number=1117, CC=305, KEY=06, ASC=29, ASCQ=02,
SENSE=70.00.06.00.00.00.00.0E.00.00.00.00.29.02-.00.00.2C.E4.00.00.00.00.,
Description=Drive failure). Refer to Appendix D in the 'Messages' manual
for recommended action.

3) In addition, tapes get stuck in drive MT1.0.0.3, and I have to power
cycle the library in order to get the tape out. So I don't think it is
stuck in the drive because of some hardware failure, but rather the server
is very confused about the I/O errors, and it gets to a point where it
doesn't know what to do.

4) But the problem isn't only with drive MT1.0.0.3. My second drive
MT2.0.0.3 has also had this problem, but with much less frequency.

5) I also notice errors in the windows event/system logs:

Event Type:     Error
Event Source:   AdsmScsi
Event Category: None
Event ID:       3
Date:           6/27/2005
Time:           9:54:37 AM
User:           N/A
Computer:       TENEDOS
Description:
A check condition error has occurred on device \Device\mt1.0.0.3 during
Rewind with completion code DD_DRIVE_FAILURE. Refer to the device's SCSI
reference for appropriate action.

6) I doubt it's a tape failure because how could 16 tapes fail all at
once. In addition, I've marked unavailable tapes back to read/write, and
they have worked for a while, but eventually the I/O errors come back.

7) I've also resat the SCSI controller in my host, and unplugged/replugged
all SCSI cables. But I still have the problem.

I've got a service call into Overland, but I haven't heard from them.
Right now our backups are down, because as soon as the library tries to
read a tape, the I/O errors pop up, the tape is marked unavailable, and I
have to restart everything. Also, it doesn't help that our tape library
and disk spools are near full capacity.

Anyone have a clue as to the source of my problem?

Thanks!

Alex