Veritas-bu

[Veritas-bu] Re: SCSI reset errors and downed drives

2001-09-06 18:37:18
Subject: [Veritas-bu] Re: SCSI reset errors and downed drives
From: scott.kendall AT abbott DOT com (scott.kendall AT abbott DOT com)
Date: Thu, 6 Sep 2001 17:37:18 -0500
Great information... I have just one question.  What about NT/2000?

I too have had the "DOWN" drive problem when a server is rebooted and for
whatever reason doesn't see the drives in the correct order or is missing some
of the drives.

NT/2000 use "tape numbers" when configuring drives.  Even if all of the SCSI
target and LUNs for the drives are the same, if a drive is missing it will
shift the rest and still confuse things.

Example:
drive 0 is always target 1, LUN 0
drive 1 is always target 1, LUN 1
drive 2 is always target 1, LUN 2

drive 0 is Tape0 (or \\.\Tape0) within NT
drive 1 is Tape1 within NT
drive 2 is Tape2 within NT

If we reboot and for some reason drive 1 is not there, drive 2 will still be
target 1, LUN 2... but this won't help because it is now Tape1 instead of
Tape2.

The only way I see around this is for NT/2000 to be able to persistently bind
their "tape numbers" to a specific SCSI target/LUN (and I don't know if it can
do this)

OR

to have NetBackup on NT configured for the actual SCSI bus/port/target/LUN of
the drive instead of the tape number

OR

to have NetBackup look at the serial number of the drive and do everything
dynamically (the reason this comes to mind is that there is a similar problem
with Oracle on NT and raw partitions, which are required for OPS.  Since the
partitions are raw you don't have drive letters and have to map things to a
disk number within NT.  What Oracle did was use a symbolic link that maps
itself to a specific disk partition.  When the disk number changes, the
symbolic link dynamically maps to the new disk number.  I believe they are
doing this by looking at the signature that NT writes on the disk.)  This
seems like the most flexible, but not available today... maybe a future
release of SSO.


Thanks,
Scott



                                                                                
                                                   
                    anthony.guzzi AT storability DOT com                        
                                                          
                    Sent by:                             To:     dayalsd AT 
lycos DOT com, <veritas-bu AT mailman.eng.auburn DOT edu>            
                    veritas-bu-admin AT mailman DOT eng.        cc:             
                                                          
                    auburn.edu                           Subject:     
[Veritas-bu] Re: SCSI reset errors and downed drives         
                                                                                
                                                   
                                                                                
                                                   
                    09/05/2001 12:30 PM                                         
                                                   
                                                                                
                                                   
                                                                                
                                                   





I've got one phrase for you:     persistent binding

I'm wondering if your bridges are being "discovered"/recognized in a
different order then when NBU was installed.  If you are using a fabric,
then remember that for most OS's, unless you specify otherwise, the first
fibre device found by the OS will be assigned target 0 off the HBA, the
second one will get target 1, etc.  Keep in mind that there's no guarantee
the devices will be found in the same order each time.  And should a fibre
switch reboot, you run the risk (though very slim) that the targets may
change.  But with persistent binding, you'll be binding each fibre
device's world-wide name (WWN) to a specific SCSI target off the HBA.  The
way no matter what order the system sees the devices, they'll always get
the same SCSI target.

I recently had to work on an L-700 with 12 fibre-native STK 9840 tape
drives.  Each drive was connected directly to a Brocade switch as was the
master server.  Every time the server rebooted, we would get downed
drives.  This was a result of some of the tape drives being
'discovered'/recognized by the system in a different order then when NBU
was set up.  As such, they were being given different SCSI target numbers.
This 're-arrangement' of the tape drives really messed up NBU.  The end
result was the master server was instructing the robot to put a tape in
one drive and then accessing [via the SCSI target] another drive and as
would be expected failed to see the tape and so downed the drive.

Check your fibre HBA vendor's documentation for instructions on how to
enable persistent binding for the HBA's driver under HP-UX (the procedure
differs by vendor, driver, and OS).

-- Tony Guzzi
Sr. Solutions Engineer, AssuredRestore team
Storability, Inc.






To: veritas-bu AT mailman.eng.auburn DOT edu
Date: Wed, 05 Sep 2001 08:36:07 -0500
From: "dayal singh" <dayalsd AT lycos DOT com>
Reply-To: dayalsd AT lycos DOT com
Organization: Lycos Mail  (http://mail.lycos.com:80)
Subject: [Veritas-bu] SCSI reset errors and downed drives

NBU GURUs,
                          We are continuously experiencing SCSI reset
erros and the drives are being downed on some of the drives.  Most of the
time it happens on the specific drives, sometimes it affects other drives
also. I am running NBU DataCenter 3.4 on HP-UX 11.0, N-class machine and I
have twenty  Quantum DLT8000 drives, connected through fiber to a
SureStore L700 tape library over HP fiber-scsi bridges. The bridges have a
firmware of 4040.

Anyone has seen these errors, any fixes i.e patches etc  ?

Y'r resonse is greatly appreciated.

TIA

Dayal




_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu





<Prev in Thread] Current Thread [Next in Thread>