On Thu, 11 Sep 2003, Stan Horwitz wrote:
SH> >It seems like nsrmmd keeps a drive unavailable, without any reason. This
SH> >probably happens after a backup job finishes. However, I can't seem to be
SH> >able to figure out what causes this. No jobs are running, and there are no
SH> >savesets waiting to be backed up. There are no pending media requests. The
SH> >drive still sits in the state "ready for writing, idle". An attempt to
SH> >unmount the drive makes the server claim that the drive is busy. I can't
SH> >see any rouge savegrp processer or similar.
SH> If you haven't done so yet, try looking in the /nsr/logs/daemon.log
SH> file to see if there are any error messages there.
Nope, nothing related there, as far I can see.
I tried this:
[root@britt:/nsr/logs] ps -ef |grep 587
root 587 245 0 18:30:35 ? 7:05 /usr/sbin/nsrmmd -n 4
root 11628 11570 0 15:32:51 pts/0 0:00 grep 587
[root@britt:/nsr/logs] kill 587
[root@britt:/nsr/logs] ps -ef |grep 587
root 11633 11570 0 15:32:55 pts/0 0:00 grep 587
Which gave the following output in the daemon.log:
2003-09-11 15.32.54 nsrd: media info: restarting nsrmmd #4 on
britt.qbranch.se in 2 minute(s)
2003-09-11 15.32.59 nsrd: media info: restart of nsrmmd #4 on
britt.qbranch.se cancelled
Then I ejected the tape.
After that, I tried starting a group, which created a pending mount
request, which was partly solved by mounting a volume in this drive, which
previously had the stale nsrmmd process attached to it. It appears that
another nsrmmd process has taken over the control over this drive:
[root@britt:/nsr/logs] fuser /dev/rmt/9cbn
/dev/rmt/9cbn: 1199o
[root@britt:/nsr/logs] ps -ef |grep nsrmmd
root 583 245 0 18:30:30 ? 30:51 /usr/sbin/nsrmmd -n 1 -r
britt.qbranch.se
root 12849 245 0 15:41:12 ? 0:00 /usr/sbin/nsrmmd -n 13
root 499 245 1 18:30:04 ? 2:29 /usr/sbin/nsrmmdbd
root 586 245 0 18:30:33 ? 51:10 /usr/sbin/nsrmmd -n 3
root 585 245 0 18:30:32 ? 41:16 /usr/sbin/nsrmmd -n 2
root 588 245 0 18:30:37 ? 32:12 /usr/sbin/nsrmmd -n 5
root 589 245 0 18:30:39 ? 11:03 /usr/sbin/nsrmmd -n 6
root 12502 245 0 15:40:28 ? 0:00 /usr/sbin/nsrmmd -n 10
root 13014 11570 0 15:42:03 pts/0 0:00 grep nsrmmd
root 12098 245 0 15:39:52 ? 0:00 /usr/sbin/nsrmmd -n 8
root 12810 245 0 15:41:03 ? 0:00 /usr/sbin/nsrmmd -n 12
root 12511 245 0 15:40:30 ? 0:00 /usr/sbin/nsrmmd -n 11
root 1199 245 2 02:10:32 ? 0:02 /usr/sbin/nsrmmd -n 7
I.e. number 7 has control over the drive now. And the group seems to run
as normal... Annoying.
So does anyone have any idea of what could be causing this?? My guess
would be a buggy nsrmmd or some kind of bad tape/tape drive/tape
configuration or something..
I'm grateful for (almost) all input I can get on this issue. :)
//Oscar
--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
|