Networker

Re: [Networker] SCSI problems -- How many drives to a bus?

2004-01-09 17:08:21
Subject: Re: [Networker] SCSI problems -- How many drives to a bus?
From: George Sinclair <George.Sinclair AT NOAA DOT GOV>
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Date: Fri, 9 Jan 2004 17:09:05 -0500
This is pretty much how we have it. If you're standing behind the
machine, looking from left to right, there are 3 dual channel cards.
card A (far left) has the ATL stuff, but we do not have the ATL picker
on its own channel, however. Instead, the two SDLT drives and the picker
in the ATL library are all daisy chained and run off channel 1, but as I
said, we've had no problems with the ATL library.

The next card B has L80 drives 1 and 2 on channel 1 and drives 3 and 4
on channel 2.

The last card C has the L80 picker on channel 1.

And, each Adaptec card is located on its own PCI bus. This host has
several independent PCI buses. I have read the top enclosure panel
carefully, and it illustrates that each of these PCI slots is in fact
its own bus. As near as I can tell, these cards do NOT share the same
bus. This is something we thought might have been causing the problem
before when we were on the older Linux storage node that only had like
one or two buses. But, after moving to this new linux box, with all
these separate buses, and even after putting the ATL picker on its own
separate card and bus, the problem still has not gone away. I'm
wondering if there would be any merit in moving the ATL off there
completely, but I don't see how it could be interfering but you never
know. I guess Storage Tek will be most apt to work with us if we can
prove that there's no other devices on there but there's? I just don't
know what Storage Tek will be able to do? The last time I complained to
them about this, they came out and replaced the terminators which they
claim were the wrong ones. They accepted fault on that saying that they
had received calls from customers about problems with the terminators.
The time before that, they came out and said that we did not have the
proper cables. Once again, they accepted fault on that and ordered us
new cables. I guess they can run diagnostics, but I have a feeling they
will find two things: 1. The drives will behave fine. 2. They will see
error logs on the drives that were caused by these SCSI resets, so what
can they really do?

The thing about this problem is that when it occurs, the backups cannot
run because they cannot request a load of a tape since the picker
disappears. This really sucks when fulls are running, and I have to
restart them from scratch. Ughh!!!!!!

George

Rodney Rutherford wrote:
>
> George,
>
> I previously ran an L80 on Solaris where I definitely saw problems
> having the robot arm and a drive share the bus; in fact, StorageTek
> specifically recommends against that.  Once I added an additional
> controller to place it on its own, I never had another problem.
>
> In your case, you now have the robot arm separate, but LTO drives
> on a bus may be a bit much.  I would recommend only 1-2 drives per bus
> (my config was 2 drives per bus).  You mention that you have the dual-channel
> Adaptec cards, so do you actually have 2 drives on each channel, or
> 4 drives all on one channel?  Each channel is its own independent SCSI bus.
> With 3 dual-channel cards, you actually have 6 separate SCSI buses.
>
> I would recommend:
>
> card A - channel 1:  ATL picker
> card A - channel 2:  ATL drives
>
> card B - channel 1:  L80 picker
> card B - channel 2:
>
> card C - channel 1:  L80 drives 1 and 2
> card C - channel 2:  L80 drives 3 and 4
>
> Also, just as important as spreading the load across the SCSI buses is
> spreading the PCI load across PCI buses.  If your Linux system has
> multiple PCI buses, make sure you place each Adaptec card on its own
> PCI bus.  At a minimum, put the ATL card on one PCI bus, and the 2 L80
> cards on their own PCI bus.
>
> Rodney
>
> George Sinclair wrote:
> > Hi,
> >
> > We have a Storagetek L80 tape library with 4 LTO drives. We've been
> > seeing a lot of SCSI problems on the host. Host is a storage node
> > running RedHat Linux. I end up rebooting this host about once a week
> > because the /etc/LGTOuscsi/inquire utility fails to see the picker
> > device. This is really annoying. We finally moved the storage node to
> > another, more powerful Linux box with more buses, etc. Same problem
> > there!!! The first clue is the "read open error, Device or resource
> > busy" message that appears next to the affected device in the devices
> > section of the nwadmin window. Often, a backup will be running when the
> > host loses communication to the picker.
> >
> > We have the robot on its own separate bus and all 4 drives share a bus.
> > Max sessions per device is set to 5. We're running 6.1.1 under Solaris
> > primary server. Should also note that we do have an ATL SDLT tape
> > library running on there, too. Its picker, and two drives all share same
> > bus, but this bus is its boss and does not share anything with the L80.
> > So, we have three Adpactec cards: one for ATL, one for L80 picker and
> > one for L80 LTO drives (dual channel Adapctec cards).
> >
> > I'm wondering if we have too many LTO drives on the bus? Could this
> > cause these SCSI problems? Maybe better to have no more than two drives
> > per bus? Someone suggested that we get the picker on its own bus which
> > we recently did but that didn't fix it. I'm beginning to think that
> > there's something wrong with the Storage Tek library and maybe it's time
> > to have Storage Tek come look at it. Maybe we should get a temp license
> > for another storage node and move the ATL over there so we only have one
> > library on this host? Guess it would be easier to troubleshoot, but
> > seems silly to have to do that. NO reason we should not be able to run
> > two libraries, and the thing is is that the ATL libary never gives us
> > any problems. I never see these "read open error ..." messages on there.
> > Hmm ....
> >
> > Any thoughts?
> >
> > Thanks.
> >
> > George
> >
> > --
> > Note: To sign off this list, send a "signoff networker" command via email
> > to listserv AT listmail.temple DOT edu or visit the list's Web site at
> > http://listmail.temple.edu/archives/networker.html where you can
> > also view and post messages to the list.
> > =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
> >
> >
>
> --
> Rodney P. Rutherford
> http://www.par3concepts.com/
>
> --
> Note: To sign off this list, send a "signoff networker" command via email
> to listserv AT listmail.temple DOT edu or visit the list's Web site at
> http://listmail.temple.edu/archives/networker.html where you can
> also view and post messages to the list.
> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=