Re: [Networker] nsrjobd fubar in 7.3.4 and 7.4 SP2?
2008-07-06 11:25:19
I just started to see hanged sessions in the group's 'show details'
window after the upgrade to 7.3.4. It seems that I was getting these
messages in daemon.log:
nsrjobd: jobsdb size at 22938674 exceeded high size watermark.
And then, many of these:
nsrjobd: Jobs error: Unable to find record for job 236744 during an
attempt to send message to it
After discussing this with my support engineer, he found esg93651 (to
which I have no access via powerlink) which suggests raising the jobsdb
size till there are no watermark messages (the default is 20Mb). I
suspect (it is hinted by the esg) that the process that trims the jobsdb
does a poor job and therefore it is better to let records expire by
time. Come to think of it, it seems like a poor design decision to have
this limit to begin with, as most people can allocate even a 1Gb for the
jobsdb and avoid the hanged save problem.
Oscar Olsson wrote:
On 2008-05-27 12:21, Peter Viertel revealed:
PV> I messed it up back when I had 733 by fiddling with the setting for
PV> maximum jobsdb size. I'd added a zero to the end thinking that
PV> allowing it to be bigger would mean less issues with its GC routines
PV> but Emc told us to put it back to the default and it seemed to work
PV> since then.
The settings we have changed, per EMC support recommendation is to lower
the data retention in the jobsdb to three days, and increase the size to
100MB. This has had no effect, at least not a positive one. :)
Another thing we have changed, also per their recommendation is to
increase the number of TCP connections that can be opened or be half-open
per second. I also belive that has no effect, especially considering that
we still see the same problems. :P
PV> Have you tried moving the whole jobsdb directory aside and restarting
networker?
Several times, it works OK for a day or two, but then the messages start
appearing in the logs indicating that stuff can't talk to it, some
savesets get aborted due to inactivity, nsrjobd takes lots of CPU and
memory etc etc, until nothing works. That process takes about a week tops.
PV> I share your pain with emc support.
Yes. Nothing has really changed during the last years when it comes to
their ability to identify and solve software bugs. Although, I am getting
the feeling that the industry as a whole is closing in to the EMC
networker level of support (sadly enough).
//Oscar
To sign off this list, send email to listserv AT listserv.temple DOT edu and type
"signoff networker" in the body of the email. Please write to networker-request
AT listserv.temple DOT edu if you have any problems with this list. You can access the
archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER
--
-- Yaron.
To sign off this list, send email to listserv AT listserv.temple DOT edu and type
"signoff networker" in the body of the email. Please write to networker-request
AT listserv.temple DOT edu if you have any problems with this list. You can access the
archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER
|
<Prev in Thread] |
Current Thread |
[Next in Thread> |
- Re: [Networker] nsrjobd fubar in 7.3.4 and 7.4 SP2?,
Yaron Zabary <=
|
|
|