Networker

Re: [Networker] Question about fibre channel on a Solaris box

2005-11-02 13:03:08
Subject: Re: [Networker] Question about fibre channel on a Solaris box
From: Itzik Meirson <imeirson AT MBI.CO DOT IL>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Wed, 2 Nov 2005 19:58:56 +0200
Once you know that Solaris connection to the drive was reestablished you can 
try to reset the drive from Networker's side with:
nsrjb -HHvf /dev/rmt/xcbn. 
This is also expected to eject the tape from the drive and clean the drive/slot 
status in jukebox configuration.
You may have to re-inventory the slot the tape has been returned to with: nsrjb 
-IvS xxx.
Itzik

> -----Original Message-----
> From: Legato NetWorker discussion 
> [mailto:NETWORKER AT LISTSERV.TEMPLE DOT EDU] On Behalf Of Sebastian 
> Sch?nwetter
> Sent: Wednesday, November 02, 2005 18:01
> To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
> Subject: Re: [Networker] Question about fibre channel on a Solaris box
> 
> Have you already tried to use the tape device files using 
> "dd" or "tar" 
> after such an incident ?  What is the error you get from 
> Solaris ?  A simple "cfgadm -c configure c[ctrl number]::[WWN 
> of tape device]" or "devfsadm" might help re-establishing the 
> connection to the tape device ?  Quoting Stan Horwitz 
> <stan AT TEMPLE DOT EDU>:
> 
> > We have a Sony PetaSite that's connected to a Sunfire v480 
> via fibre. 
> >  The tape library has 13 S-AIT tape drives, each connected 
> to a Qlogic  
> > fibre channel switch (or bridge). The Qlogic switch is 
> connected to  
> > the v480 via a Qlogic fibre card in the v480 and the Qlogic 
> drivers.  
> > We use this is our NetWorker server with Solaris 9 and NetWorker 
> > 7.2.1.
> >
> > For the most part, this system works great! We back up a 
> few terabytes 
> > each night for about 220 clients with more clients being 
> added all the 
> > time. We also have successfully recovered several terabytes 
> worth of 
> > data since we deployed this hardware.
> > Unfortunately, I have a knack for finding obscure bugs in 
> software and 
> > it seems as if our NetWorker server is no exception. I found two 
> > obscure bugs last year that both caused nsrd with 7.1.3 to 
> crash about 
> > once a month. Fortunately, that problem seems to have gone 
> away  after 
> > we upgraded to 7.2.1.
> >
> > Unfortunately, due to a bad choice in the way I initially 
> configured 
> > our PetaSite for tape cleaning and a misunderstanding about 
> how many 
> > cleanings can be done with S-AIT cleaning tapes and extremely heavy 
> > use of our PetaSite, I have uncovered a bug in the tape drives'
> > firmware such that once in a while, a tape drive in our 
> PetaSite will  
> > go off line while its trying to locate a mark on a tape. In working 
> > with people at Sony, we are now aware of why this problem 
> happens. An  
> > updated tape drive firmware version is being tested now by Sony.
> >
> > The reason I am saying this is because each time a S-AIT goes off- 
> > line (i.e., loses communication with our PetaSite's 
> controller unit),  
> > it also loses communication with our NetWorker server. To 
> resolve this 
> > issue, I shut down all the NetWorker daemons, power cycle 
> the failed 
> > tape drive, eject the tape that is in the tape drive, reboot our 
> > NetWorker server then reset the library from within NetWorker.
> > This process typically requires an hour of my time, 
> including a trip 
> > across campus to where our tape library is located. Its a 
> pain in the  
> > neck. Fortunately, this situation rarely causes a 
> significant delay in 
> > our backup schedule and I can sometimes wait a few days 
> before I take 
> > action to put the failed tape drive back on line.
> >
> > Although I expect this problem to be resolved fairly soon, 
> what I am 
> > wondering about is if there is a better way for me to trigger 
> > NetWorker and Solaris to reestablish access to a failed 
> fibre channel  
> > tape drive after I have power cycled it and it is again 
> accessible to  
> > the PetaSite's controller. The site engineer from Sony who 
> works on  
> > our PetaSite and the software support engineer who I have 
> been working 
> > with by phone and email on this situation both say I should 
> be able to 
> > get the drive back online without rebooting our Solaris
> > box, but I have no idea how. So far, the only way I can 
> figure out   
> > how to get our v480 to talk to the device again is to do a 
> reboot, but 
> > that's often not possible because we tend to keep our tape library 
> > busy 24x7.
> >
> > Any suggestions on how to handle this better than what I do 
> now will 
> > be appreciated.
> >
> > To sign off this list, send email to 
> listserv AT listserv.temple DOT edu and 
> > type "signoff networker" in the body of the email. Please write to 
> > networker-request AT listserv.temple DOT edu if you have any problems wit 
> > this list. You can access the archives at 
> > http://listserv.temple.edu/archives/networker.html or via RSS at 
> > http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER
> >
> >
> >
> >
> 
> 
> 
> --
> Sebastian Schönwetter
> Open2 BVBA
> Steenweg op Brussel 149
> 1780 Wemmel
> Belgium
> 
> 
> Tel : 0485 / 844 368
> Fax : 02 / 460 39 86
> 
> BTW. nr : 866.285.026
> Ond. nr : 0866285026
> 
> To sign off this list, send email to 
> listserv AT listserv.temple DOT edu and type "signoff networker" in the
> body of the email. Please write to 
> networker-request AT listserv.temple DOT edu if you have any problems
> wit this list. You can access the archives at 
> http://listserv.temple.edu/archives/networker.html or
> via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER
> 

To sign off this list, send email to listserv AT listserv.temple DOT edu and 
type "signoff networker" in the
body of the email. Please write to networker-request AT listserv.temple DOT edu 
if you have any problems
wit this list. You can access the archives at 
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER