Networker

Re: [Networker] SCSI Bus resets in SAN fabric environment

2004-10-08 05:08:06
Subject: Re: [Networker] SCSI Bus resets in SAN fabric environment
From: "Dokter, Mark" <mark.dokter AT CAPGEMINI DOT COM>
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Date: Fri, 8 Oct 2004 11:09:55 +0200
 Our config


SUN v440. Direct attached1 Tb for adv_file type.
2 qlc HBA to sissco san switches.
2 L100 tape libraries, each with 4 LTO-2 drives (HP).
In each L100 there are two Atto bridges (with sun eomfile on it)
One library shares the 4 LTO drives with 4 Physical Dell systems,
running win2003 and exchange. It's a windows cluster.

Each tape drive of the shared L100 is configured in a separte zone. Each
windows system is enabled in one or more zone, does not matter for this
reset issue.

There are now 2 bugs:

First the scsi resets:

There is a slow device, the robot arm. Also sometime a LTO-2 requires
sometimes upto 2 hours for a timeout.
The Legato lus driver handles the scsi drive not quite friendly, they
are doing rather nasty things.
This triggera bug (desing flaw) in the SUN qlc driver for the Qlogic
HBA's.
The results is dat de qlc driver get lost of the timing of the scsi
events. It then generates a scsi reset but instead for 1 devices is
resets both scsi busses behand the bridge.
This happens when the server is busy or the robot arm is not responsive
enough or a tape unit requires some more then a normal timing.

Sun Has redisigned the qlc driver. That fix works very nice, we haven't
had any reset since it is installed. It is a temporary fix named
IDR118291-01. (You have to run sol9 and the most current update of the
SAN 4.4.2 patches). Ask SUN for that patch.


The second bug:

A windows cluster reqires a windows driver on the HBA cards in the
windows cluster. When that HBA is in the same zone as a bridge of the
shared library and the librarie is running sun;s oemfile then de tape
librarie will not boot. The mc300 management card and the windows
software both tries to configure the jukebox.

The solution is to install an extra HBA in de the windows system with a
proper dedives driver, not a driver from windows.

MArk

-----Original Message-----
From: Chad Smykay [mailto:csmykay AT rackspace DOT com]
Sent: Thursday, October 07, 2004 4:32 PM
To: 'Legato NetWorker discussion'; Dokter, Mark
Subject: RE: [Networker] SCSI Bus resets in SAN fabric environment

Mark,

Are all of your Servers in the same Datazone?  (i.e. one of them is the
NW
server and the others are SN's)

-----Original Message-----
From: Legato NetWorker discussion [mailto:NETWORKER AT LISTMAIL.TEMPLE DOT EDU]
On
Behalf Of Mark Dokter
Sent: Thursday, October 07, 2004 6:53 AM
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Subject: Re: [Networker] SCSI Bus resets in SAN fabric environment

On Wed, 6 Oct 2004 13:17:12 -0400, Claudio Jaime <cjaime AT FALABELLA DOT CL>
wrote:

>Hi, I have the same problem
>
>My configuration:
>
>Server Legato: AIX 5.1
>Legato  7.1 Build 230
>
>SAN Components:
>
>Router Crossroads 10000
>Switch SilkWorm 3200
>Library L700 with seven SDLT3200
>
>The seven units share by four nodes (3 AIX and 1 Tru64)
>
>
>Best regards
>
>Claudio E. Jaime
>Depto. Ing. de Sistemas
>Fono: 56-2-3802938
>mailto: cjaime AT falabella DOT cl
>
>
>-----Mensaje original-----
>De: Legato NetWorker discussion
[mailto:NETWORKER AT LISTMAIL.TEMPLE DOT EDU]En
>nombre de Shaun Ellis
>Enviado el: Martes, 31 de Agosto de 2004 16:16
>Para: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
>Asunto: Re: [Networker] SCSI Bus resets in SAN fabric environment
>
>
>The NSR from HP can block SCSI bus resets.
>
>S Ellis
>OpenVMS Product Manager & Advisory Technical Marketing Engineer
>LEGATO Software
>3210, Porter Drive
>Palo Alto
>Ca 94304
>USA
>Phone: +1 (650) 842 9548
>Mobile: +1 (408) 431 6997
>MMS:   4084316997 AT mobile.att DOT net
>
>> -----Original Message-----
>> From: owner-networker AT LISTMAIL.TEMPLE DOT EDU [mailto:owner-
>> networker AT LISTMAIL.TEMPLE DOT EDU] On Behalf Of Alain Richard
>> Sent: Tuesday, August 31, 2004 1:12 PM
>> To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
>> Subject: SCSI Bus resets in SAN fabric environment
>>
>> Bus resets in SAN fabric environment
>>
>> Hi
>>
>> We have problem with scsi bus reset. In our case, some time, our
>> Compaq tru64 cluster and our Windows cluster send SCSI BUS RESET.
>> (see extract below)
>>
>> /var/adm/messages.1:Jul  4 03:55:46 rosalie root: [ID 702911
>daemon.notice]
>> NetWorker media: (info) EEI bus/device reset detected on
>/dev/ntape/tape2_d1
>> /var/adm/messages.1:Jul  4 03:55:50 rosalie root: [ID 702911
>daemon.notice]
>> NetWorker media: (info) EEI bus/device reset detected
>> on /dev/ntape/tape2_d1 at opening. Rewinding tape
>>
>> Question:
>> =========
>> - we use a FC to SCSI router to connect our librairy. Is there a
router
on
>> the market who can filter "SCSI BUS RESET" to prevent the tape drive
from
>> rewinding by mistake?
>>
>> Our kit is:
>> ===========
>> Compaq SAN (3 switch,1 SCSI router 2FC->4LVD)
>> SUN e250 as Legato server DDS (SAN attach),(2 Compaq "jni" HBA)
>> Compaq tru64 cluster as SAN storage node and DDS, (2 Compaq "emulex"
HBA)
>> Widows 2000 as SAN storage node and DDS, (2 Compaq "emulex" HBA)
>> Widows 2000 cluster as SAN storage node and DDS, (2 Compaq "emulex"
HBA)
>> Widows 2000 cluster as SAN storage node and DDS, (2 Compaq "QLOGIC"
HBA)
>> 2 library STK L80 with 4 SDLT drives "LVD"
>>
>> Alain R.
>>
>> --
>> Note: To sign off this list, send a "signoff networker" command via
email
>> to listserv AT listmail.temple DOT edu or visit the list's Web site at
>> http://listmail.temple.edu/archives/networker.html where you can
>> also view and post messages to the list.
>> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
>
>--
>Note: To sign off this list, send a "signoff networker" command via
email
>to listserv AT listmail.temple DOT edu or visit the list's Web site at
>http://listmail.temple.edu/archives/networker.html where you can
>also view and post messages to the list.
>=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
>
>--
>Note: To sign off this list, send a "signoff networker" command via
email
>to listserv AT listmail.temple DOT edu or visit the list's Web site at
>http://listmail.temple.edu/archives/networker.html where you can
>also view and post messages to the list. Questions regarding this list
>should be sent to stan AT temple DOT edu
>=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

We have the same problem in the same configuration.
It''s caused by the lus driver triggering a bug in the qlc driver when
there
are slow device (robot arm). The result is a total reset of the bridge.

There is an IDR patch which will soon be a regular patch. It works fine
now,
no scsi resets anymore.

The IDR is IDR118291-01

Ask SUN.

--
Note: To sign off this list, send a "signoff networker" command via
email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list. Questions regarding this list
should be sent to stan AT temple DOT edu
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=


This message contains information that may be privileged or confidential and is 
the property of the Capgemini Group. It is intended only for the person to whom 
it is addressed. If you are not the intended recipient,  you are not authorized 
to read, print, retain, copy, disseminate,  distribute, or use this message or 
any part thereof. If you receive this  message in error, please notify the 
sender immediately and delete all  copies of this message.

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list. Questions regarding this list
should be sent to stan AT temple DOT edu
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

<Prev in Thread] Current Thread [Next in Thread>