Veritas-bu

[Veritas-bu] Speaking of disappearing drives

2004-05-11 14:33:52
Subject: [Veritas-bu] Speaking of disappearing drives
From: ccollier AT classmates DOT com (Chris Collier)
Date: Tue, 11 May 2004 11:33:52 -0700
I'll start with our setup first:

Sun 280R
  1G nic
  4G memory
  2  x 900mhz procs
  2  x StorEdge 2G HBA's
  Solaris 8 (108528-23)
  Netbackup DC 3.4.1 (w/manual mod to sgscan to detect the ait3 drives 
(http://seer.support.veritas.com/docs/246050.htm))
  
ADIC Scalar1000
  12 x AIT3 drives
  3  x SNC5100's

Fiber is directly connected from the HBA's to the SNC's


About 2 months ago we upgraded from AIT2 to AIT3 drives, and we upgraded from 
JNI's to StorEdge 2G HBA's. From day 1 we've been having problems with drives 
randomly dropping off. It never seems to happen during a backup but happens at 
the end and, I believe, sometimes during no activity. If one drive is down and 
we leave it down then everything will work fine for a week. If we bring up the 
down'd drive (with stopltid/tpconfig -> up drive/ltid) then anywhere from 10 
minutes to a few hours another drive will go down. When a drive goes down it is 
still accessable through the robot (using the panel on the robot) and the SNC's 
don't detect any drive problems although they do report random "loss of sync" 
from the HBA's occasionally. Any mt commands run against the down drive show 
the drive as being inaccessable. I've implemented a temporary bandaid to "fix" 
the problem by using 'vmoprcmd -up <drive #>' which has been working pretty 
well and for some reason has caused things to be a little more stable - don't 
ask me why. I've verified the drive is being used after I up it with this 
command. 

I also tweaked some of the system resource settings according to Veritas' 
recommended system requirements, and I have a case open with both Sun and Adic 
(who really haven't been any help at all). I've also enabled verbose output 
from tl8d. I know that there are still some system settings that need to be 
adjusted because it doesn't matter if one backup or 10 backups are running the 
iowait averages about 50% but bounces around between 20-95% (no backups = 0% 
iowait).  This is the next area we are concentrating on.

I'm not sure but I suspect that what we're seeing is more of a problem with the 
driver for the HBA's but I'm curious if anyone else has had similar problems 
with their setup or has any suggestions? 

Thanks,

-- 
Chris


<Prev in Thread] Current Thread [Next in Thread>
  • [Veritas-bu] Speaking of disappearing drives, Chris Collier <=