[Networker] nsrjobd fubar in 7.3.4 and 7.4 SP2?

2008-05-27 03:27:39
Subject: [Networker] nsrjobd fubar in 7.3.4 and 7.4 SP2?
From: Oscar Olsson <spam1 AT QBRANCH DOT SE>
Date: Tue, 27 May 2008 09:26:16 +0200
We've had an ongoing issue with 7.3.4 and 7.4 SP2 (both with the security fix applied) where nsrjobd seems to have some troubles. The process takes up 100% CPU on one thread (looping thread is my guess), and it also grows continously, up to about 500MB allocated memory. After a while, processes seem to be experiencing communications problems with nsrjobd which makes processes wait, which means that groups don't run, recover sessions take long until they can browse, up until the backup system is rendered useless. A restart doesn't really help either since nsrjobd is quite unresponsive even after the restart and quickly allocates memory and starts using lots of CPU right away. I also don't see any messages about that it has managed to purge old records anymore. Ofcourse this is a serious problem.

And why am I telling the list this? Well, as usual EMC support is quite unresponsive and takes a long time to collect the information needed to troubleshoot this. For instance, we've had a P1 case opened from monday to friday and they still needed more info after an escalation was opened on Friday, even though the reasons for the troubles were quite clearly located within nsrjobd. I'm guessing this is a more or less general problem with nsrjobd, so I would like to receive feedback from other users in the community that might experience similar issues. Contact me on- or off-list. I don't mind sharing experiences about this on the list as well.


