Networker

Re: [Networker] We're getting closer

2005-04-28 07:32:30
Subject: Re: [Networker] We're getting closer
From: Conrad Macina <conrad.macina AT PFIZER DOT COM>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Thu, 28 Apr 2005 07:27:08 -0400
When an nsrmmd hangs and can't be killed even with a kill -9, try power
cycling the tape drive. It's not going to do any more harm than has already
been done, and occasionally (certainly not all the time) it inspires the
nsrmmd to give up the ghost, thereby saving a reboot.

Conrad Macina
Pfizer, Inc.


On Wed, 27 Apr 2005 13:38:44 -0700, Darren Dunham <ddunham AT TAOS DOT COM> 
wrote:

>> Very often, networker starts a drive operation and it takes 3 to 8 hours to
>> complete. The operation can be just about anything, eject, move forward,
>> verify the tape etc. It's like networker sent the command to the drive, but
>> the drive never got it.
>
>More to the point, networker sends the commands to the driver, but the
>driver doesn't deal with it properly.
>
>> When this happens, the nsrmmd for that drive gets
>> locked into an uninteruptable i/o state. If we stop networker, the nsrmmd
>> for that drive still hangs around and no amount of killing will get rid of
>> it. This problem happens at various times on all drives. The only resolution
>> so far seems to be a reboot. We have a call open with legato on this and are
>> in the process of opening a call with Red Hat.
>
>Right.  A userland process cannot be killed while it's in the middle of
>a system call.  If the driver decides to wig out and not return (or
>properly time out), too bad.  There's nothing the userland process can
>do about it.  It also suggests that it's not really a networker issue.
>
>> Is there any sort of retry timer we can set for the tape operations?
>> Would an upgrade to 7.1.3 help?
>
>If you can't do a kill -9 on nsrmmd, then the problem is in the kernel.
>Either in a generic tape driver, a specific HBA driver, or elsewhere.
>It's possible that a different version of Networker would make the calls
>differently and avoid the bug, but that's not guaranteed.
>
>
>
>
>--
>Darren Dunham                                           ddunham AT taos DOT com
>Senior Technical Consultant         TAOS            http://www.taos.com/
>Got some Dr Pepper?                           San Francisco, CA bay area
>         < This line left intentionally blank to confuse you. >
>
>--
>Note: To sign off this list, send a "signoff networker" command via email
>to listserv AT listserv.temple DOT edu or visit the list's Web site at
>http://listserv.temple.edu/archives/networker.html where you can
>also view and post messages to the list. Questions regarding this list
>should be sent to stan AT temple DOT edu
>=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listserv.temple DOT edu or visit the list's Web site at
http://listserv.temple.edu/archives/networker.html where you can
also view and post messages to the list. Questions regarding this list
should be sent to stan AT temple DOT edu
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

<Prev in Thread] Current Thread [Next in Thread>