It seems like my initial report was somewhat misleading. The tape that it
wants to load when it fails with the reported error is, according to
networker, already loaded in device /dev/rmt/2cbn on storage node sn01.
However, there has been no activity on that drive for a while. Checking
the jukebox inventory confirms that this volume is actually loaded in the
specified drive, so its not an inventory desync problem. The drive itself
seems to be working though, as I can send mt commands to it from the
command shell on the storage node. However, when I check the device on the
storage node with fuser in order to find out what process is using the
device in question, I see that no processes are currently using the file,
which means that the nsrmmd process that should be using the device, isn't
using it. My new conclusion is that somehow the nsrmmd process bugged and
is no longer using the file, although it thinks it is, and thereby
disallows new nsrmmd processes to be spawned in order to access the drive.
At this stage I assume the best way to clear this error before it occurs
again is to kill the nsrmmd process that is misbehaving, so networker can
start a new one, that will hopefully open the device in question and
resume normal operations. It would also be good to find the problem that
caused the nsrmmd process to misbehave in this manner. Also, is it
possible to find what nsrmmd process thinks its using that device/drive,
so I can kill it? Or is there anything else I should do?
All input is welcome. :)
//Oscar
To sign off this list, send email to listserv AT listserv.temple DOT edu and
type "signoff networker" in the body of the email. Please write to
networker-request AT listserv.temple DOT edu if you have any problems with this
list. You can access the archives at
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER
|