Networker

Re: [Networker] nsrmmd process does not stop

2005-02-09 12:45:26
Subject: Re: [Networker] nsrmmd process does not stop
From: thierry.faidherbe AT HP DOT COM
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Wed, 9 Feb 2005 12:43:06 -0500
In most of time, you can force an hung read() or write() syscall to end
by resetting the tape device. Power off is often used but some OS have
commands to reset the device from a software level :

first locate the nsrmmd pid :
 # fuser /dev/rmt/.....
 note its PID

from solaris 8 and higher :

 # cfgadm -al
 retrieve the tape device
 eg : c4::rmt/4      tape     connected    configured   unknown
 # cfgadm -x reset_device c4::rmt/4

from Tru64 Unix :
 # hwmgr -show scsi |grep tape4
  retrieve bus target and lun info
 # scu
 scu> sbtl [bus] [target] [lun]
 scu> reset device
 scu> reset device.

Then, finish by a kill of the nsrmmd.

Good luck
HTH,

Th

>> Hello,
>>
>> Networker Server: Solaris 8
>> Networker Software: 7.1.2
>>
>> Have run into a strange problem where I have two nsrmmd process that
>> will
>> not be killed on the networker server. I stopped networker and all of
>> the
>> other networker process shutdown normall. Prior to this happening I had
>> a
>> backup session that was writing to two lto1 fiber drives. The backup
>> that
>> was writing to drives was hung. I stopped the backup that was hung, then
>> proceeded to stop networker. I noticed the two hung nsrmmd processes by
>> running a ps -ef|grep nsr. I waited 30 minutes and the two processes
>> were
>> still in the process table. Checking the daemon.log and
>> /var/adm/messages I
>> don't see any tape drive errors or any reason why the backup would have
>> hung in the first place. Does anyone have any suggestions as to why this
>> would be happening.
>
> If a device has an issue, then commands and process which access the
> device can hang.  If you truss the processes, are they in a read() or
> write() system call?
>
> If you run fuser on the tape devices, do those processes have them open?
>
> If so, they're probably in the system call.  They cannot be killed or
> have any signal operate on them until the call returns back to user
> space.
>
> Often it's difficult to force that to occur.  Sometimes power cycling
> the drive is enough to kick the driver back to some bit of sanity.
> Sometimes you just have to reboot.  It depends on the device and on the
> driver.
>
>
> --
> Darren Dunham                                           ddunham AT taos DOT 
> com
> Senior Technical Consultant         TAOS            http://www.taos.com/
> Got some Dr Pepper?                           San Francisco, CA bay area
>          < This line left intentionally blank to confuse you. >
>
> --
> Note: To sign off this list, send a "signoff networker" command via email
> to listserv AT listserv.temple DOT edu or visit the list's Web site at
> http://listserv.temple.edu/archives/networker.html where you can
> also view and post messages to the list. Questions regarding this list
> should be sent to stan AT temple DOT edu
> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
>

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listserv.temple DOT edu or visit the list's Web site at
http://listserv.temple.edu/archives/networker.html where you can
also view and post messages to the list. Questions regarding this list
should be sent to stan AT temple DOT edu
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=