Networker

Re: [Networker] nsrjobd fubar in 7.3.4 and 7.4 SP2?

2008-05-27 06:30:51
Subject: Re: [Networker] nsrjobd fubar in 7.3.4 and 7.4 SP2?
From: Peter Viertel <Peter.Viertel AT MACQUARIE DOT COM>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Tue, 27 May 2008 20:21:22 +1000
I'm running 734 on a sol10 host. Jobd seems ok. 

I messed it up back when I had 733 by fiddling with the setting for maximum 
jobsdb size. I'd added a zero to the end thinking that allowing it to be bigger 
would mean less issues with its GC routines but Emc told us to put it back to 
the default and it seemed to work since then. 

Have you tried moving the whole jobsdb directory aside and restarting 
networker?  

I share your pain with emc support. 


----- Original Message -----
From: EMC NetWorker discussion <NETWORKER AT LISTSERV.TEMPLE DOT EDU>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU <NETWORKER AT LISTSERV.TEMPLE DOT EDU>
Sent: Tue May 27 17:26:16 2008
Subject: [Networker] nsrjobd fubar in 7.3.4 and 7.4 SP2?

We've had an ongoing issue with 7.3.4 and 7.4 SP2 (both with the security 
fix applied) where nsrjobd seems to have some troubles. The process takes 
up 100% CPU on one thread (looping thread is my guess), and it also grows 
continously, up to about 500MB allocated memory. After a while, processes 
seem to be experiencing communications problems with nsrjobd which makes 
processes wait, which means that groups don't run, recover sessions take 
long until they can browse, up until the backup system is rendered 
useless. A restart doesn't really help either since nsrjobd is quite 
unresponsive even after the restart and quickly allocates memory and 
starts using lots of CPU right away. I also don't see any messages about 
that it has managed to purge old records anymore. Ofcourse this is a 
serious problem.

And why am I telling the list this? Well, as usual EMC support is quite 
unresponsive and takes a long time to collect the information needed to 
troubleshoot this. For instance, we've had a P1 case opened from monday to 
friday and they still needed more info after an escalation was opened on 
Friday, even though the reasons for the troubles were quite clearly 
located within nsrjobd. I'm guessing this is a more or less general 
problem with nsrjobd, so I would like to receive feedback from other users 
in the community that might experience similar issues. Contact me on- or 
off-list. I don't mind sharing experiences about this on the list as well.

//Oscar

To sign off this list, send email to listserv AT listserv.temple DOT edu and 
type "signoff networker" in the body of the email. Please write to 
networker-request AT listserv.temple DOT edu if you have any problems with this 
list. You can access the archives at 
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

NOTICE
This e-mail and any attachments are confidential and may contain copyright 
material of Macquarie Group Limited or third parties. If you are not the 
intended recipient of this email you should not read, print, re-transmit, store 
or act in reliance on this e-mail or any attachments, and should destroy all 
copies of them. Macquarie Group Limited does not guarantee the integrity of any 
emails or any attached files. The views or opinions expressed are the author's 
own and may not reflect the views or opinions of Macquarie Group Limited.