Networker

[Networker] SCSI bus resets????!!!

2005-12-23 10:44:16
Subject: [Networker] SCSI bus resets????!!!
From: George Sinclair <George.Sinclair AT NOAA DOT GOV>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Fri, 23 Dec 2005 10:38:16 -0500
Hi,

First, if someone can suggest a better site that I could get help on this issue from, please let me know. LSI, Storagetek, Legato, and Quantum have been unable to resolve this thus far. Everyone seems to think the problem is not on their end.

Our Linux RedHat storage node host (Dell powerEdge 6600) generates SCSI bus reset messages in the system log almost daily (different times) for the library picker or more specifically for the scsi id being used by the robot. Is it possible that too many groups running could cause this or maybe something isn't configured properly in Legato? From the looks of the messages, his just doesn't appear to be a Legato NetWorker problem.

I was labeling some tapes last night, nothing else going on, and suddenly it just hangs. Sure enough, another SCSI bus reset reported for the picker. We've had Storagetek replace the robotic controller card in the library and the internal SCSI cable, and it's still occurring. We're using LVD tape libraries (STL L80 w/ 4 LTO-1 drives) and P1000 SDLT w/2 SDLT-1 drives).

We tried replacing cards, SCSI cables, moving cards to different PCI slots in the hosts, different servers, different drivers, etc. Still, same problem. We were having a similar problem with Adaptec 39160s so we switched to LSI.

We're using LSI-22320-R dual channel 320 cards. I've seen this occur sometimes also on our Quantum library. Quantum says that Raid controller cards can cause problems for pickers, but LSI says this card should work fine. It has RAID 0 and 1 (striping and mirroring) capabilities, but it should otherwise function just like any LVD/SE card. We've not configured these cards to do any RAID stuff. There is no option, however, to specifically turn it off. There are various speeds that can be set on all the devices on each channel. The default (highest) is 320, but you can go down to lower speeds, e.g. 160, 80, 40, 20, all the way down to 'ASYNC'. I've tried different speeds on the picker, down to 80 so far. I would think the devices would auto negotiate, anyway. The picker is on its own channel at id 0. The drives are on their own channels at ids 2,3,4 and 5. Should the picker be at a lower speed, maybe even ASYNC? Maybe I should try changing the picker to a higher ID like 6 or 8?

Dec 23 15:01:31 santana kernel: scsi : aborting command due to timeout : pid 78695, scsi2, channel 0, id 0, lun 0 Move medium/play audio(12) 00 00 00 01 f5 04 1c 00 00 00 00 Dec 23 15:01:31 snode1 kernel: mptscsih: ioc0: id=0 OldAbort: scheduling ABORT SCSI IO (sc=f6111200)
Dec 23 15:01:31 snode1 kernel: mptbase: Initiating ioc0 recovery
Dec 23 15:01:32 snode1 kernel: SCSI host 2 abort (pid 78695) timed out - resetting
Dec 23 15:01:32 snode1 kernel: SCSI bus is being reset for host 2 channel 0.


Thanks.

George

To sign off this list, send email to listserv AT listserv.temple DOT edu and type 
"signoff networker" in the
body of the email. Please write to networker-request AT listserv.temple DOT edu 
if you have any problems
wit this list. You can access the archives at 
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

<Prev in Thread] Current Thread [Next in Thread>