Networker

Re: [Networker] Networker hangs completely at times

2006-03-31 16:10:43
Subject: Re: [Networker] Networker hangs completely at times
From: Stan Horwitz <stan AT TEMPLE DOT EDU>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Fri, 31 Mar 2006 16:07:09 -0500
On Mar 30, 2006, at 1:39 AM — 3/30/06, Oscar Olsson wrote:

Has anyone else but us seen regular complete hangs of networker in release 7.3? Backups hang, and the system becomes unresponsive, NMC, nsrwatch and nwadmin all hang. After a while nothing happens in the logs.. Before it dies completely, messages such as these can be seen in the daemon.log:

03/29/06 18:01:13 nsrd: [Jukebox `Osato', operation 448]. Initiated operation `Clean device /dev/rmt/2cbn using cleaning slot 430'. 03/29/06 18:01:13 nsrd: [Jukebox `Osato', operation 449]. Initiated operation `Clean device /dev/rmt/1cbn using cleaning slot 430'. 03/29/06 18:03:56 nsrd: [Jukebox `Osato', operation # 445]. Automatically terminating operation `OP_CLEAN', instance 445, on jukebox `Osato'. Cannot allocate the 1 required device(s). 03/29/06 18:03:56 nsrd: [Jukebox `Osato', operation # 445]. Finished with status: failed 03/29/06 19:11:03 nsrlcpd #1: Jukebox `Osato' is exiting. The jukebox is no longer managed by nsrlcpd.

Is there a problem with cleaning devices in conjunction with that many savesets/drives are busy writing? Is this some kind of deadlock state?

As usual, our networker support has been less than helpful with this issue.

Check to make sure the settings on your tape library do not conflict with the settings you have in NetWorker for handling cleaning tapes. I have not seen the behavior you describe, especially since I am not running NetWorker 7.3, but with the Sony PetaSite tape library we use and NetWorker 7.2.1, if the tape library has any configuration entries at all to handle cleaning tapes, it will not allow NetWorker to mount a cleaning tape and NetWorker reacts by refusing to do anything else with tape mounts until it can satisfy the cleaning request, which has the non-good effect of hanging just about everything, but I have never seen nsrwatch or nwadmin fail in that situation.

To sign off this list, send email to listserv AT listserv.temple DOT edu and type 
"signoff networker" in the
body of the email. Please write to networker-request AT listserv.temple DOT edu 
if you have any problems
wit this list. You can access the archives at 
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER