Networker

Re: [Networker] SCSI errors

2003-06-04 16:16:38
Subject: Re: [Networker] SCSI errors
From: Matthew Temple <mht AT RESEARCH.DFCI.HARVARD DOT EDU>
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Date: Wed, 4 Jun 2003 16:01:03 -0400
Frank,

        I missed the beginning of this thread.   Did it ever work?

        I've had similar problems on two extended occasions with a
Qualstar library and three sony AIT-2 drives.   The card is an adaptec
(I'm not near the machine) 2940U2.   The first time occurred after a move.
The tape in the library, would go into a hung state when trying to verify
label.   first it was all three drives.   Then I messed around with
termination and changed the scsi cable and the problem reverted to
just one drive. -- SCSI resets and timeouts.   Then I switched
SCSI cards (same card, just a different one.)   I took a DDS tape drive
off the internal channel.   Same result.   Then I reattached the
internal tape drive.   Everything worked.   I don't know why.
It was all so ... SCSI.   SCSI cables have always been mysterious to me --
we've had RAID stacks that worked when their cable was curved to the left
but now when curved to the right.

        A month later, a similar event occurred and the fix was equally
random.   Strangely enough I replaced two AIT-2  drives with AIT-3 drives
last week and expected the worst.   But they came up flawlessly.
In SCSI, the name of the game is:

        1. Keep the cables as short as possible.
        2. Is everything REALLY terminated correctly (how about
                the jumpers on the devices?  TERM-POWER?)
        3. Try different cable positions.
        4. The following is MOSTLY true -- It's probably NOT your card.
                        (except when it is 8-)   )

The only cables worse than SCSI were very old Thickwire ethernet.

On Wed, 4 Jun 2003, Frank Altpeter wrote:

> hi again,
>
> It seems that i don't have any time to celebrate any little success,
> since it doesn't take long time for the next problem to appear :-(
>
> Since some weeks i sometimes see SCSI errors in the system log file,
> like these:
>
> Jun  1 12:45:12 r2d2 kernel: scsi : aborting command due to timeout : pid 
> 213591 32, scsi2, channel 0, id 0, lun 0 Inquiry 00 00 00 ff 00
>
> In the beginning, there was only scsidev 0,0,0 (the first of five
> LTO ultrium tape devices) that got the error.
> Because i assumed the error in the SCSI card, i disabled this drive
> until i can contact a StorageTek professional to check it.
>
> But now i see SCSI errors on all five tape devices. The tapes are
> distributed to three Tekram DC-390U3W cards.
> So i don't think that the cards are causing the errors, but i don't
> have any clue where to search now.
>
> Additionally to the above entry, i'm getting a lot of these now:
>
> Jun  4 08:40:30 r2d2 kernel: sym53c1010-33-0-<6,*>: target did not report 
> SYNC.
> Jun  4 08:40:31 r2d2 kernel: scsi : aborting command due to timeout : pid 
> 30973041, scsi5, channel 0, id 4, lun 0 Inquiry 00 00 00 f f 00
> Jun  4 08:40:31 r2d2 kernel: sym53c8xx_abort: pid=30973041 
> serial_number=30972996 serial_number_at_timeout=30972996
>
>
> I know this is slightly off-topic, but it's a great problem for me,
> and btw. NetWorker of course refuses to work if a tape devices keeps
> hanging in the 'verifying label' state due to scsi timeouts...
>
> Well, to the hardware:
>
> 3 x Tekram DC-390 U3W Ultra 160 SCSI cards
> 5 x IBM LTO Ultrium drives, builtin to a StorageTek L700 tape library
> 1 x ASUS PR-DLSW Dual Intel Xeon-based motherboard
> 2 x Intel Xeon 2 GHz
> 1 x 1024 MB RAM
> 1 x Redhat Linux 7.3 (Valhalla)
>
> [root@r2d2 root]# cat /proc/scsi/scsi
> Attached devices:
> Host: scsi0 Channel: 00 Id: 05 Lun: 00
>   Vendor: easyRAID Model:  F8              Rev: 0001
>   Type:   Direct-Access                    ANSI SCSI revision: 03
> Host: scsi0 Channel: 00 Id: 06 Lun: 00
>   Vendor: STK      Model: L700             Rev: 0301
>   Type:   Medium Changer                   ANSI SCSI revision: 03
> Host: scsi2 Channel: 00 Id: 00 Lun: 00
>   Vendor: IBM      Model: ULTRIUM-TD1      Rev: 27Q1
>   Type:   Sequential-Access                ANSI SCSI revision: 03
> Host: scsi2 Channel: 00 Id: 01 Lun: 00
>   Vendor: IBM      Model: ULTRIUM-TD1      Rev: 27Q1
>   Type:   Sequential-Access                ANSI SCSI revision: 03
> Host: scsi2 Channel: 00 Id: 02 Lun: 00
>   Vendor: IBM      Model: ULTRIUM-TD1      Rev: 27Q1
>   Type:   Sequential-Access                ANSI SCSI revision: 03
> Host: scsi5 Channel: 00 Id: 03 Lun: 00
>   Vendor: IBM      Model: ULTRIUM-TD1      Rev: 27Q1
>   Type:   Sequential-Access                ANSI SCSI revision: 03
> Host: scsi5 Channel: 00 Id: 04 Lun: 00
>   Vendor: IBM      Model: ULTRIUM-TD1      Rev: 27Q1
>   Type:   Sequential-Access                ANSI SCSI revision: 03
>
> With kind regards,
>
>         Frank Altpeter
>
> --
> Note: To sign off this list, send a "signoff networker" command via email
> to listserv AT listmail.temple DOT edu or visit the list's Web site at
> http://listmail.temple.edu/archives/networker.html where you can
> also view and post messages to the list.
> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
>

--
=============================================================
Matthew Temple                Tel:    617/632-2597
Director, Research Computing  Fax:    617/582-7820
Dana-Farber Cancer Institute  mht AT research.dfci.harvard DOT edu
44 Binney Street,  ML105      http://research.dfci.harvard.edu
Boston, MA 02115              Choice is the Choice!

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

<Prev in Thread] Current Thread [Next in Thread>