Veritas-bu

[Veritas-bu] Interesting Things w/ SSO on a Fibre Tape SAN (Win2k)

2002-06-17 18:39:03
Subject: [Veritas-bu] Interesting Things w/ SSO on a Fibre Tape SAN (Win2k)
From: scott.kendall AT abbott DOT com (scott.kendall AT abbott DOT com)
Date: Mon, 17 Jun 2002 17:39:03 -0500
We ran into a similar problem with Compaq's GbE card, NC6136.  It would die
(but still had link light) after a large number of connections from clients
were established, wasn't necessarily related to throughput... more number os
sessions.  This would KILL SSO.  Working with Compaq on it and am told a new
driver is in the works.  In the meantime, it works great with the Intel driver
(actually an Intel card underneath all the logo'ing).

As for the SCSI reserve/release... fyi, it is available with 3.4.1 in patch
_3.  In either one, 3.4.1_3 or 4.5, I'm being told that it is off by default.


- Scott



                                                                                
                                                   
                    "Anderson, David"                                           
                                                   
                    <anderson.david@scrippshealth        To:     "'veritas-bu 
AT mailman.eng.auburn DOT edu'"                             
                    .org>                                <veritas-bu AT 
mailman.eng.auburn DOT edu>                                       
                    Sent by:                             cc:     "Adams, Tim" 
<Adams.Tim AT scrippshealth DOT org>, "Morris, Brett"       
                    veritas-bu-admin AT mailman DOT eng.        <Morris.Brett 
AT scrippshealth DOT org>                                          
                    auburn.edu                           Subject:     
[Veritas-bu] Interesting Things w/ SSO on a Fibre Tape SAN   
                                                         (Win2k)                
                                                   
                                                                                
                                                   
                    06/17/2002 10:48 AM                                         
                                                   
                                                                                
                                                   
                                                                                
                                                   




First off, I'll apologize for being long winded, but this is too good NOT to
share.

Our environment:
All brand new hardware
- two Compaq ProLiant ML530's, dual CPU, 1.1 G memory
- Windows 2000
- Master Server also acting as a Media Server
- StorageTek L180 (4 Drives, also tried a Compaq MSL5026 (2 drives) all SDLT
- Compaq 8EL Fibre Switch, Compaq Modular Data Router (SCSI bridge)
- We had started with NBU 3.4.1 and in desperation, installed 4.5 (the one
with SCSI Reserve/Release).
(by the way, if you're a Windows shop, do the 4.5, and to heck with Motif!)

We have been having a lot of problems with drives randomly dropping off
line.  The libraries all saw a "SCSI Bus Rewind" request, so dropped
whatever it was doing and waited for the actual command to rewind (which
would never come).  There was literally nothing that we didn't try.  With
marginal support from any of the vendors by the way, except StorageTek.

There seemed to be no correlation between events in NetBackup or issues
directly related to the hardware.  Over the past (many) weeks, we did notice
that we had occasional network problems.  Specifically, the Master and Media
servers deciding to not communication on the Gigabit Ethernet LAN.  We
originally had Compaq Gig-NIC's, but they seemed to be having a lot of
packet over-run problems, among others.  We have installed the 3Com card
instead.  This helped, but did not resolve the issues.

Finally, one of our guys started thinking about SSO being able to
communicate with itself, server to server to coordinate the SCSI
Reserve/Release function.  With this in mind, he installed a second NIC (yes
the original Compaq card) and set the hosts tables so that the Master and
Media servers would use only the dedicated channel to communicate.

We have now run for three days, including a full weekend running over 1500
separate small jobs.  No a single problem.  I'll admit that it is still too
early to put this matter to bed just yet, but this solution looks very
promising at this point.

Has anyone else seen problems of this sort?

David Anderson
ScrippsHealth Information Services

_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu





<Prev in Thread] Current Thread [Next in Thread>