Networker

Re: [Networker] Drive state never goes from idle to done. nsrmmd keeping drive open.

2003-09-11 09:53:18
Subject: Re: [Networker] Drive state never goes from idle to done. nsrmmd keeping drive open.
From: "VERHAEGHE Koen (BMB)" <Koen.VERHAEGHE AT PROXIMUS DOT NET>
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Date: Thu, 11 Sep 2003 15:53:08 +0200
Oscar, I've seen this before here too. Usually when cloning.
Have a look at the client that was last writing to that drive, there
might by some savefs (or save) processes hanging. Killing them should
release the drive. I guess savefs is talking to nsrexecd on the client,
that one talking to nsrexecd on the server, which has control over the
nsrmmd proces.

Cheers
Koen


-----Original Message-----
From: Oscar Olsson [mailto:spam1 AT QBRANCH DOT SE] 
Sent: Thursday, September 11, 2003 3:47 PM
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Subject: Re: [Networker] Drive state never goes from idle to done.
nsrmmd keeping drive open.


On Thu, 11 Sep 2003, Stan Horwitz wrote:

SH> >It seems like nsrmmd keeps a drive unavailable, without any reason.

SH> >This probably happens after a backup job finishes. However, I can't

SH> >seem to be able to figure out what causes this. No jobs are 
SH> >running, and there are no savesets waiting to be backed up. There 
SH> >are no pending media requests. The drive still sits in the state 
SH> >"ready for writing, idle". An attempt to unmount the drive makes 
SH> >the server claim that the drive is busy. I can't see any rouge 
SH> >savegrp processer or similar.
SH> If you haven't done so yet, try looking in the /nsr/logs/daemon.log 
SH> file to see if there are any error messages there.

Nope, nothing related there, as far I can see.

I tried this:

[root@britt:/nsr/logs] ps -ef |grep 587
    root   587   245  0 18:30:35 ?        7:05 /usr/sbin/nsrmmd -n 4
    root 11628 11570  0 15:32:51 pts/0    0:00 grep 587
[root@britt:/nsr/logs] kill 587
[root@britt:/nsr/logs] ps -ef |grep 587
    root 11633 11570  0 15:32:55 pts/0    0:00 grep 587

Which gave the following output in the daemon.log:

2003-09-11 15.32.54 nsrd: media info: restarting nsrmmd #4 on
britt.qbranch.se in 2 minute(s) 2003-09-11 15.32.59 nsrd: media info:
restart of nsrmmd #4 on britt.qbranch.se cancelled

Then I ejected the tape.

After that, I tried starting a group, which created a pending mount
request, which was partly solved by mounting a volume in this drive,
which previously had the stale nsrmmd process attached to it. It appears
that another nsrmmd process has taken over the control over this drive:

[root@britt:/nsr/logs] fuser /dev/rmt/9cbn
/dev/rmt/9cbn:     1199o
[root@britt:/nsr/logs] ps -ef |grep nsrmmd
    root   583   245  0 18:30:30 ?       30:51 /usr/sbin/nsrmmd -n 1 -r
britt.qbranch.se
    root 12849   245  0 15:41:12 ?        0:00 /usr/sbin/nsrmmd -n 13
    root   499   245  1 18:30:04 ?        2:29 /usr/sbin/nsrmmdbd
    root   586   245  0 18:30:33 ?       51:10 /usr/sbin/nsrmmd -n 3
    root   585   245  0 18:30:32 ?       41:16 /usr/sbin/nsrmmd -n 2
    root   588   245  0 18:30:37 ?       32:12 /usr/sbin/nsrmmd -n 5
    root   589   245  0 18:30:39 ?       11:03 /usr/sbin/nsrmmd -n 6
    root 12502   245  0 15:40:28 ?        0:00 /usr/sbin/nsrmmd -n 10
    root 13014 11570  0 15:42:03 pts/0    0:00 grep nsrmmd
    root 12098   245  0 15:39:52 ?        0:00 /usr/sbin/nsrmmd -n 8
    root 12810   245  0 15:41:03 ?        0:00 /usr/sbin/nsrmmd -n 12
    root 12511   245  0 15:40:30 ?        0:00 /usr/sbin/nsrmmd -n 11
    root  1199   245  2 02:10:32 ?        0:02 /usr/sbin/nsrmmd -n 7

I.e. number 7 has control over the drive now. And the group seems to run
as normal... Annoying.

So does anyone have any idea of what could be causing this?? My guess
would be a buggy nsrmmd or some kind of bad tape/tape drive/tape
configuration or something..

I'm grateful for (almost) all input I can get on this issue. :)

//Oscar

--
Note: To sign off this list, send a "signoff networker" command via
email to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can also
view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=


**** DISCLAIMER ****

"This e-mail and any attachment thereto may contain information which is 
confidential and/or protected by intellectual property rights and are intended 
for the sole use of the recipient(s) named above. 
Any use of the information contained herein (including, but not limited to, 
total or partial reproduction, communication or distribution in any form) by 
other persons than the designated recipient(s) is prohibited. 
If you have received this e-mail in error, please notify the sender either by 
telephone or by e-mail and delete the material from any computer".

Thank you for your cooperation.

For further information about Proximus mobile phone services please see our 
website at http://www.proximus.be or refer to any Proximus agent.

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=