Networker

Re: [Networker] Question about fibre channel on a Solaris box

2005-11-04 13:28:01
Subject: Re: [Networker] Question about fibre channel on a Solaris box
From: Stan Horwitz <stan AT TEMPLE DOT EDU>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Fri, 4 Nov 2005 13:26:41 -0500
On Nov 2, 2005, at 11:00 AM, Sebastian Schönwetter wrote:

Have you already tried to use the tape device files using "dd" or "tar" after such an incident ? What is the error you get from Solaris ? A simple "cfgadm -c configure c[ctrl number]::[WWN of tape device]" or "devfsadm" might help re-establishing the connection to the tape device ?

There is nothing "simple" about devfsadm and cfgadm to those who are not familiar with those utilities.

Your help is appreciated. I also appreciate the responses that others sent.

This kind of event happened early this morning where a tape drive went red and NSR choked on the tape in it. I know why this happened, but I used this as an opportunity to try something out that I thought of as a result of reading some of the replies to my question.

I power cycled the tape drive in question. It came back online (in the sense that our Sony PetaSite could see it). Usually, what I do when this happens is I manually eject the tape, reboot the server, and then use NSR to enable the tape drive, and go through the usual nsrjb reset process.

This time, I simply power cycled the tape drive, but I left the tape in the drive. I returned to my office after I was satisfied that the drive was okay.

Instead of rebooting our NetWorker server (which I can't do this afternoon anyway), I did a "mt -f /dev/rmt/9cbn status" and the result showed a fatal condition, as per

SONY 1/2 INCH S-AIT tape drive:
   sense key(0x10)= fatal   residual= 0   retries= 0
   file no= 0   block no= 0

I then did ...

# mt -f /dev/rmt/9cbn rewind

which worked fine, so I did ...

# mt -f /dev/rmt/9cbn status
SONY 1/2 INCH S-AIT tape drive:
   sense key(0x0)= No Additional Sense   residual= 0   retries= 0
   file no= 0   block no= 0

So, i figured I had nothing to lose by trying ...

nsrjb -j PetaSite -u S00409S1 (which is the tape that was stuck there).

It worked. No errors. Usually when I try this without rebooting, scsi errors appear and the njsrjb dies after its retry limit hits.

So, I did a ...

nsrjb -j PetaSite -Iv -S 609 (which is where S00409S1 belongs),

# nsrjb -j PetaSite -Iv -S 609
setting verbosity level to `1'
nsrjb: Looking at device `/dev/rmt/0cbn' id 0.
nsrjb: Looking at device `/dev/rmt/1cbn' id 1.
nsrjb: Looking at device `/dev/rmt/2cbn' id 2.
nsrjb: Looking at device `/dev/rmt/3cbn' id 3.
nsrjb: Looking at device `/dev/rmt/4cbn' id 4.

and two things happened. First, NSR dutifly cleaned that tape drive, then it proceeded to process the command fine.

Splendid! I hope when our backup schedule begins in about three hours that this does not come back to haunt me. I also just marked tape S00409S1 as read only because I suspect it has a faulty tape mark on it.

To sign off this list, send email to listserv AT listserv.temple DOT edu and type 
"signoff networker" in the
body of the email. Please write to networker-request AT listserv.temple DOT edu 
if you have any problems
wit this list. You can access the archives at 
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER