Networker

Re: [Networker] Networker hangs completely at times

2006-03-30 13:51:25
Subject: Re: [Networker] Networker hangs completely at times
From: Robert Maiello <robert.maiello AT PFIZER DOT COM>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Thu, 30 Mar 2006 13:49:56 -0500
Wow Ocscar, 7.3 is certainly a different beast.

>From your logs it looks like your using software/Networker based cleaning,
2 drives need to be cleaned and it's confused about the cleaning slots
or can't deal with 2 drives needing to be clean at once. It looks
like this is handled by the nsrlcpd process which, it has decided, will not
talk to the jukebox anymore.  

I agree to look for core dumps.  You could also turn off cleaning in 
Networker and see if this avoids this event.  You could try allocating 
several cleaning tapes or set the last cleaned date so the drives aren't
up for cleaning..anything to avoid it.  

It sounds similar to our 7.1.1 cleaning bug (fixed in 7.??)..if a drive
is up for cleaning and a tape is in it and an inventory is run the inventory
will hang..it cannot unload the drive to clean it.  Of course, the software
doesn't come to a halt with this, only tape movement.


Robert Maiello
Pioneer Data Systems

On Thu, 30 Mar 2006 08:39:59 +0200, Oscar Olsson <spam1 AT QBRANCH DOT SE> 
wrote:

>Has anyone else but us seen regular complete hangs of networker in release
>7.3? Backups hang, and the system becomes unresponsive, NMC, nsrwatch and
>nwadmin all hang. After a while nothing happens in the logs.. Before it
>dies completely, messages such as these can be seen in the daemon.log:
>
>03/29/06 18:01:13 nsrd: [Jukebox `Osato', operation 448]. Initiated
>operation `Clean device /dev/rmt/2cbn using cleaning slot 430'.
>03/29/06 18:01:13 nsrd: [Jukebox `Osato', operation 449]. Initiated
>operation `Clean device /dev/rmt/1cbn using cleaning slot 430'.
>03/29/06 18:03:56 nsrd: [Jukebox `Osato', operation # 445]. Automatically
>terminating operation `OP_CLEAN', instance 445, on jukebox `Osato'. Cannot
>allocate the 1 required device(s).
>03/29/06 18:03:56 nsrd: [Jukebox `Osato', operation # 445]. Finished with
>status: failed
>03/29/06 19:11:03 nsrlcpd #1: Jukebox `Osato' is exiting. The jukebox is
>no longer managed by nsrlcpd.
>
>Is there a problem with cleaning devices in conjunction with that many
>savesets/drives are busy writing? Is this some kind of deadlock state?
>
>As usual, our networker support has been less than helpful with this
>issue.
>
>//Oscar
>
>To sign off this list, send email to listserv AT listserv.temple DOT edu and 
type "signoff networker" in the
>body of the email. Please write to networker-request AT listserv.temple DOT 
>edu 
if you have any problems
>wit this list. You can access the archives at 
http://listserv.temple.edu/archives/networker.html or
>via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER
>=========================================================================

To sign off this list, send email to listserv AT listserv.temple DOT edu and 
type "signoff networker" in the
body of the email. Please write to networker-request AT listserv.temple DOT edu 
if you have any problems
wit this list. You can access the archives at 
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER