Re: [Networker] Question about fibre channel on a Solaris box
2005-11-02 11:12:59
Have you already tried to use the tape device files using "dd" or "tar"
after such an incident ? What is the error you get from Solaris ? A
simple "cfgadm -c configure c[ctrl number]::[WWN of tape device]" or
"devfsadm" might help re-establishing the connection to the tape device
? Quoting Stan Horwitz <stan AT TEMPLE DOT EDU>:
We have a Sony PetaSite that's connected to a Sunfire v480 via fibre.
The tape library has 13 S-AIT tape drives, each connected to a
Qlogic fibre channel switch (or bridge). The Qlogic switch is
connected to the v480 via a Qlogic fibre card in the v480 and the
Qlogic drivers. We use this is our NetWorker server with Solaris 9
and NetWorker 7.2.1.
For the most part, this system works great! We back up a few
terabytes each night for about 220 clients with more clients being
added all the time. We also have successfully recovered several
terabytes worth of data since we deployed this hardware.
Unfortunately, I have a knack for finding obscure bugs in software
and it seems as if our NetWorker server is no exception. I found two
obscure bugs last year that both caused nsrd with 7.1.3 to crash
about once a month. Fortunately, that problem seems to have gone away
after we upgraded to 7.2.1.
Unfortunately, due to a bad choice in the way I initially configured
our PetaSite for tape cleaning and a misunderstanding about how many
cleanings can be done with S-AIT cleaning tapes and extremely heavy
use of our PetaSite, I have uncovered a bug in the tape drives'
firmware such that once in a while, a tape drive in our PetaSite will
go off line while its trying to locate a mark on a tape. In working
with people at Sony, we are now aware of why this problem happens. An
updated tape drive firmware version is being tested now by Sony.
The reason I am saying this is because each time a S-AIT goes off-
line (i.e., loses communication with our PetaSite's controller unit),
it also loses communication with our NetWorker server. To resolve
this issue, I shut down all the NetWorker daemons, power cycle the
failed tape drive, eject the tape that is in the tape drive, reboot
our NetWorker server then reset the library from within NetWorker.
This process typically requires an hour of my time, including a trip
across campus to where our tape library is located. Its a pain in the
neck. Fortunately, this situation rarely causes a significant delay
in our backup schedule and I can sometimes wait a few days before I
take action to put the failed tape drive back on line.
Although I expect this problem to be resolved fairly soon, what I am
wondering about is if there is a better way for me to trigger
NetWorker and Solaris to reestablish access to a failed fibre channel
tape drive after I have power cycled it and it is again accessible
to the PetaSite's controller. The site engineer from Sony who works
on our PetaSite and the software support engineer who I have been
working with by phone and email on this situation both say I should
be able to get the drive back online without rebooting our Solaris
box, but I have no idea how. So far, the only way I can figure out
how to get our v480 to talk to the device again is to do a reboot,
but that's often not possible because we tend to keep our tape
library busy 24x7.
Any suggestions on how to handle this better than what I do now will
be appreciated.
To sign off this list, send email to listserv AT listserv.temple DOT edu and
type "signoff networker" in the
body of the email. Please write to
networker-request AT listserv.temple DOT edu if you have any problems
wit this list. You can access the archives at
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER
--
Sebastian Schönwetter
Open2 BVBA
Steenweg op Brussel 149
1780 Wemmel
Belgium
Tel : 0485 / 844 368
Fax : 02 / 460 39 86
BTW. nr : 866.285.026
Ond. nr : 0866285026
To sign off this list, send email to listserv AT listserv.temple DOT edu and type
"signoff networker" in the
body of the email. Please write to networker-request AT listserv.temple DOT edu
if you have any problems
wit this list. You can access the archives at
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER
|
|
|