Veritas-bu

[Veritas-bu] Interesting Things w/ SSO on a Fibre Tape SAN (Win2k)

2002-06-17 11:48:53
Subject: [Veritas-bu] Interesting Things w/ SSO on a Fibre Tape SAN (Win2k)
From: anderson.david AT scrippshealth DOT org (Anderson, David)
Date: Mon, 17 Jun 2002 08:48:53 -0700
First off, I'll apologize for being long winded, but this is too good NOT to
share.

Our environment:
All brand new hardware
- two Compaq ProLiant ML530's, dual CPU, 1.1 G memory
- Windows 2000
- Master Server also acting as a Media Server
- StorageTek L180 (4 Drives, also tried a Compaq MSL5026 (2 drives) all SDLT
- Compaq 8EL Fibre Switch, Compaq Modular Data Router (SCSI bridge)
- We had started with NBU 3.4.1 and in desperation, installed 4.5 (the one
with SCSI Reserve/Release).
(by the way, if you're a Windows shop, do the 4.5, and to heck with Motif!)

We have been having a lot of problems with drives randomly dropping off
line.  The libraries all saw a "SCSI Bus Rewind" request, so dropped
whatever it was doing and waited for the actual command to rewind (which
would never come).  There was literally nothing that we didn't try.  With
marginal support from any of the vendors by the way, except StorageTek.

There seemed to be no correlation between events in NetBackup or issues
directly related to the hardware.  Over the past (many) weeks, we did notice
that we had occasional network problems.  Specifically, the Master and Media
servers deciding to not communication on the Gigabit Ethernet LAN.  We
originally had Compaq Gig-NIC's, but they seemed to be having a lot of
packet over-run problems, among others.  We have installed the 3Com card
instead.  This helped, but did not resolve the issues.

Finally, one of our guys started thinking about SSO being able to
communicate with itself, server to server to coordinate the SCSI
Reserve/Release function.  With this in mind, he installed a second NIC (yes
the original Compaq card) and set the hosts tables so that the Master and
Media servers would use only the dedicated channel to communicate.

We have now run for three days, including a full weekend running over 1500
separate small jobs.  No a single problem.  I'll admit that it is still too
early to put this matter to bed just yet, but this solution looks very
promising at this point.

Has anyone else seen problems of this sort?

David Anderson
ScrippsHealth Information Services


<Prev in Thread] Current Thread [Next in Thread>