Networker

Re: [Networker] nsrmmd process does not stop

2005-02-10 02:41:15
Subject: Re: [Networker] nsrmmd process does not stop
From: thierry.faidherbe AT HP DOT COM
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Thu, 10 Feb 2005 02:36:24 -0500
Hi,

# cfgadm -x reset_device c4::rmt/4
or
scu> sbtl [bus] [target] [lun]
scu> reset device

will just reset the target device, not the entire bus.

To have the entire bus resetted, use, for Tru64 for e.g.,
scu> sbtl [bus] [target] [lun]
scu> reset bus

In worst case, if you have to reset a bus,
networker will detect bus reset and will
rewind all devices connected to that bus and then
move forward  on the mediums to latest positions but
should not cause any other problems.

Hope this helps,

Thierry

> Interesting...one can reset the device in Solaris with cfgadm.   I'm
> concerned though..will this reset the other drives on the same bus; ie.
> reset the entire scsi bus?   I don't want to interfere with other drives
> on the bus but would like to "unwedge" a drive if possible. This would be
> 1 or 2 other LTO1/2 drives on the same fiber port/bus.
>
> Robert Maiello
> Pioneer Data Systems
>
> On Wed, 9 Feb 2005 12:43:06 -0500, thierry.faidherbe AT HP DOT COM wrote:
>
>>In most of time, you can force an hung read() or write() syscall to end
>>by resetting the tape device. Power off is often used but some OS have
>>commands to reset the device from a software level :
>>
>>first locate the nsrmmd pid :
>> # fuser /dev/rmt/.....
>> note its PID
>>
>>from solaris 8 and higher :
>>
>> # cfgadm -al
>> retrieve the tape device
>> eg : c4::rmt/4      tape     connected    configured   unknown
>> # cfgadm -x reset_device c4::rmt/4
>>
>>from Tru64 Unix :
>> # hwmgr -show scsi |grep tape4
>>  retrieve bus target and lun info
>> # scu
>> scu> sbtl [bus] [target] [lun]
>> scu> reset device
>> scu> reset device.
>>
>>Then, finish by a kill of the nsrmmd.
>>
>>Good luck
>>HTH,
>>
>>Th
>>
>>>> Hello,
>>>>
>>>> Networker Server: Solaris 8
>>>> Networker Software: 7.1.2
>>>>
>>>> Have run into a strange problem where I have two nsrmmd process that
>>>> will
>>>> not be killed on the networker server. I stopped networker and all of
>>>> the
>>>> other networker process shutdown normall. Prior to this happening I
>>>> had
>>>> a
>>>> backup session that was writing to two lto1 fiber drives. The backup
>>>> that
>>>> was writing to drives was hung. I stopped the backup that was hung,
>>>> then
>>>> proceeded to stop networker. I noticed the two hung nsrmmd processes
>>>> by
>>>> running a ps -ef|grep nsr. I waited 30 minutes and the two processes
>>>> were
>>>> still in the process table. Checking the daemon.log and
>>>> /var/adm/messages I
>>>> don't see any tape drive errors or any reason why the backup would
>>>> have
>>>> hung in the first place. Does anyone have any suggestions as to why
>>>> this
>>>> would be happening.
>>>
>>> If a device has an issue, then commands and process which access the
>>> device can hang.  If you truss the processes, are they in a read() or
>>> write() system call?
>>>
>>> If you run fuser on the tape devices, do those processes have them
>>> open?
>>>
>>> If so, they're probably in the system call.  They cannot be killed or
>>> have any signal operate on them until the call returns back to user
>>> space.
>>>
>>> Often it's difficult to force that to occur.  Sometimes power cycling
>>> the drive is enough to kick the driver back to some bit of sanity.
>>> Sometimes you just have to reboot.  It depends on the device and on the
>>> driver.
>>>
>>>
>>> --
>>> Darren Dunham
>>> ddunham AT taos DOT com
>>> Senior Technical Consultant         TAOS
>>> http://www.taos.com/
>>> Got some Dr Pepper?                           San Francisco, CA bay
>>> area
>>>          < This line left intentionally blank to confuse you. >
>>>
>>> --
>>> Note: To sign off this list, send a "signoff networker" command via
>>> email
>>> to listserv AT listserv.temple DOT edu or visit the list's Web site at
>>> http://listserv.temple.edu/archives/networker.html where you can
>>> also view and post messages to the list. Questions regarding this list
>>> should be sent to stan AT temple DOT edu
>>> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
>>>
>>
>>--
>>Note: To sign off this list, send a "signoff networker" command via email
>>to listserv AT listserv.temple DOT edu or visit the list's Web site at
>>http://listserv.temple.edu/archives/networker.html where you can
>>also view and post messages to the list. Questions regarding this list
>>should be sent to stan AT temple DOT edu
>>=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
>
> --
> Note: To sign off this list, send a "signoff networker" command via email
> to listserv AT listserv.temple DOT edu or visit the list's Web site at
> http://listserv.temple.edu/archives/networker.html where you can
> also view and post messages to the list. Questions regarding this list
> should be sent to stan AT temple DOT edu
> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
>

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listserv.temple DOT edu or visit the list's Web site at
http://listserv.temple.edu/archives/networker.html where you can
also view and post messages to the list. Questions regarding this list
should be sent to stan AT temple DOT edu
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=