Networker

Re: [Networker] SCSI problems -- How many drives to a bus?

2004-01-09 16:27:17
Subject: Re: [Networker] SCSI problems -- How many drives to a bus?
From: Matt Temple <mht AT RESEARCH.DFCI.HARVARD DOT EDU>
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Date: Fri, 9 Jan 2004 16:27:11 -0500
Lemons, Terry wrote:
Hi George

According to http://www.pcguide.com/ref/hdd/if/scsi/confIDs-c.html, the SCSI
device priority order is (highest to lowest) 7, 6, 5, 4, 3, 2, 1, 0, 15, 14,
13, 12, 11, 10, 9, 8.  7 is usually the SCSI HBA itself.  Your L80 picker is
set to SCSI address 6, which is the next highest priority.  Then again, your
picker is the only other device on the bus, so I don't think this could be
the problem.

tl



-----Original Message-----
From: Legato NetWorker discussion [mailto:NETWORKER AT LISTMAIL.TEMPLE DOT EDU] 
On
Behalf Of George Sinclair
Sent: Friday, January 09, 2004 3:02 PM
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Subject: Re: [Networker] SCSI problems -- How many drives to a bus?

That's exactly what's happening. At some point the picker is no longer
seen by the software. It seems random. We might go several days before
seeing it again. Get these "aic7xxx_abort returns 0x2002" messages. This
doesn't always cause a faliure, though, but when the failure occurs, you
can bet that there was one of these SCSI resets that occured prior or
somewhere around that time seen in /var/log/messages.

How do we determine if the arm has the highest priority? How do we give
it the highest priority?

We had it daisy chained to drives 1 and 2 which were using channel A on
the Adaptec dual channel card and drives 3 and 4 were daisy chained and
used channel B. I since put the arm on its own bus so it now connects to
channel A on its own card and the drives still used A and B on the other
card as before. I terminated where the picker was daisy chained on back
of library. Here's current output from inquire:

[email protected]:SEAGATE ULTRIUM06242-XXX1522|Tape, /dev/nst0
[email protected]:SEAGATE ULTRIUM06242-XXX1522|Tape, /dev/nst1
[email protected]:SEAGATE ULTRIUM06242-XXX1522|Tape, /dev/nst2
[email protected]:SEAGATE ULTRIUM06242-XXX1522|Tape, /dev/nst3
[email protected]:ATL     P1000    62200502.23|Autochanger (Jukebox),
/dev/sg4
[email protected]:QUANTUM SuperDLT1       2323|Tape, /dev/nst4
[email protected]:QUANTUM SuperDLT1       2323|Tape, /dev/nst5
[email protected]:STK     L80             0212|Autochanger (Jukebox),
/dev/sg7
[email protected]:MegaRAIDLD 0 RAID1  139G1.92|Disk, /dev/sg8
[email protected]:PE/PV   1x8 SCSI BP     1.1 |Processor, /dev/sg9
[email protected]:HL-DT-STRW/DVD GCC-4240ND110|CD-ROM, /dev/sgk

George


I hate to be the bearer of ugly news, but I'll report what happened to
us with the aic7xxx driver on a Linux server.      Originally, I had
three AIT-2 drives and, in general, there were no scsi problems.   After
getting two AIT-3 drives in our Qualstar 120-slot library, we found that
about once a month, there would be scsi-resets.   They definitely
seemed to come out of nowhere and had a similar result.   We had tape
drives go off line  and they couldn't be reclaimed without a power down
of the library and a reboot of the server.   The most likely time for
this to happen was when labeling an unlabeled tape.   Mounting a
pre-labeled tape could generate the problem, but less likely.   7 months
ago we moved to temporary quarters and the problems simply magnified.
Same problems, but an order of magnitude more often.   I spent time
changing the order of drives, ribbon cables, etc.   I replaced the scsi
cable.   Still, the hangs seemed completely at random.   Now we moved to
new quarters, and, after setting up the library and server again, the
problems simply vanished again.   (We also replaced the last AIT-2 drive
with an AIT-3 drive.).   No one have claimed that AIT-2 and AIT-3 drives
shouldn't be able to live on the same bus.

Previous experience with SCSI showed me that a drive having problem
reading a tape can have a devastating effect on the scsi bus.

SCSI buses are prone to difficult-to-diagnose problems that can be
generated by something as trivial as the direction of curve of the
cable.   (some scsi cables like to curve left, others right.).

In a previous installation, I had a parallel problem with a stack of
external scsi drives.   This Tru64 Alpha would hang on boot about
25% of the time, with SCSI timeouts.   Simply moving the stack and
rebooting would fix the problem.   SCSI is nice and fast but has the
same builtin weirdness of thickwire ethernet.  A further problem is that
sometimes people writing to mailing lists are right about what the scsi
problem is, but other times they simply make claims.   And sometimes
weird claims seem to be what fixes your problem.   But really, scsi is
clearly magic.   You try something and it does or doesn't work and you
assume that what you did is what fixed the problem.

Right now I've backed up about 10 terabytes of data without the
slightest glitch.   Before moving, I couldn't get through a 10th of that.

                               Matt Temple



--
=============================================================
Matthew Temple                Tel:    617/632-2597
Director, Research Computing  Fax:    617/582-7820
Dana-Farber Cancer Institute  mht AT research.dfci.harvard DOT edu
44 Binney Street, LG300/300   http://research.dfci.harvard.edu
Boston, MA 02115              Choice is the Choice!

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=