Oscar, I've seen this before here too. Usually when cloning.
Have a look at the client that was last writing to that drive, there
might by some savefs (or save) processes hanging. Killing them should
release the drive. I guess savefs is talking to nsrexecd on the client,
that one talking to nsrexecd on the server, which has control over the
nsrmmd proces.
Cheers
Koen
-----Original Message-----
From: Oscar Olsson [mailto:spam1 AT QBRANCH DOT SE]
Sent: Thursday, September 11, 2003 3:47 PM
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Subject: Re: [Networker] Drive state never goes from idle to done.
nsrmmd keeping drive open.
On Thu, 11 Sep 2003, Stan Horwitz wrote:
SH> >It seems like nsrmmd keeps a drive unavailable, without any reason.
SH> >This probably happens after a backup job finishes. However, I can't
SH> >seem to be able to figure out what causes this. No jobs are
SH> >running, and there are no savesets waiting to be backed up. There
SH> >are no pending media requests. The drive still sits in the state
SH> >"ready for writing, idle". An attempt to unmount the drive makes
SH> >the server claim that the drive is busy. I can't see any rouge
SH> >savegrp processer or similar.
SH> If you haven't done so yet, try looking in the /nsr/logs/daemon.log
SH> file to see if there are any error messages there.
Nope, nothing related there, as far I can see.
I tried this:
[root@britt:/nsr/logs] ps -ef |grep 587
root 587 245 0 18:30:35 ? 7:05 /usr/sbin/nsrmmd -n 4
root 11628 11570 0 15:32:51 pts/0 0:00 grep 587
[root@britt:/nsr/logs] kill 587
[root@britt:/nsr/logs] ps -ef |grep 587
root 11633 11570 0 15:32:55 pts/0 0:00 grep 587
Which gave the following output in the daemon.log:
2003-09-11 15.32.54 nsrd: media info: restarting nsrmmd #4 on
britt.qbranch.se in 2 minute(s) 2003-09-11 15.32.59 nsrd: media info:
restart of nsrmmd #4 on britt.qbranch.se cancelled
Then I ejected the tape.
After that, I tried starting a group, which created a pending mount
request, which was partly solved by mounting a volume in this drive,
which previously had the stale nsrmmd process attached to it. It appears
that another nsrmmd process has taken over the control over this drive:
[root@britt:/nsr/logs] fuser /dev/rmt/9cbn
/dev/rmt/9cbn: 1199o
[root@britt:/nsr/logs] ps -ef |grep nsrmmd
root 583 245 0 18:30:30 ? 30:51 /usr/sbin/nsrmmd -n 1 -r
britt.qbranch.se
root 12849 245 0 15:41:12 ? 0:00 /usr/sbin/nsrmmd -n 13
root 499 245 1 18:30:04 ? 2:29 /usr/sbin/nsrmmdbd
root 586 245 0 18:30:33 ? 51:10 /usr/sbin/nsrmmd -n 3
root 585 245 0 18:30:32 ? 41:16 /usr/sbin/nsrmmd -n 2
root 588 245 0 18:30:37 ? 32:12 /usr/sbin/nsrmmd -n 5
root 589 245 0 18:30:39 ? 11:03 /usr/sbin/nsrmmd -n 6
root 12502 245 0 15:40:28 ? 0:00 /usr/sbin/nsrmmd -n 10
root 13014 11570 0 15:42:03 pts/0 0:00 grep nsrmmd
root 12098 245 0 15:39:52 ? 0:00 /usr/sbin/nsrmmd -n 8
root 12810 245 0 15:41:03 ? 0:00 /usr/sbin/nsrmmd -n 12
root 12511 245 0 15:40:30 ? 0:00 /usr/sbin/nsrmmd -n 11
root 1199 245 2 02:10:32 ? 0:02 /usr/sbin/nsrmmd -n 7
I.e. number 7 has control over the drive now. And the group seems to run
as normal... Annoying.
So does anyone have any idea of what could be causing this?? My guess
would be a buggy nsrmmd or some kind of bad tape/tape drive/tape
configuration or something..
I'm grateful for (almost) all input I can get on this issue. :)
//Oscar
--
Note: To sign off this list, send a "signoff networker" command via
email to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can also
view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
**** DISCLAIMER ****
"This e-mail and any attachment thereto may contain information which is
confidential and/or protected by intellectual property rights and are intended
for the sole use of the recipient(s) named above.
Any use of the information contained herein (including, but not limited to,
total or partial reproduction, communication or distribution in any form) by
other persons than the designated recipient(s) is prohibited.
If you have received this e-mail in error, please notify the sender either by
telephone or by e-mail and delete the material from any computer".
Thank you for your cooperation.
For further information about Proximus mobile phone services please see our
website at http://www.proximus.be or refer to any Proximus agent.
--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
|