Networker

Re: [Networker] nsrjobd Jobs error:

2008-06-11 11:15:01
Subject: Re: [Networker] nsrjobd Jobs error:
From: Stan Horwitz <stan AT TEMPLE DOT EDU>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Wed, 11 Jun 2008 11:10:23 -0400
On Jun 11, 2008, at 10:32 AM, dd1980 wrote:

Hi Guys

I am running NetWorker 7.4.2 on a Windows 2003 SP1 server, backing up to a STK L180 library with 3 tape drives.

The backups seem to hang everynight on the following message:

nsrjobd Jobs error: Unable to find record for job 32327 during an attempt to send message to it.

From the daemon.log i can see the following messages appear before the one above:

4154 10/06/2008 20:36:52  nsrmmd#1 Lost connection to Media Database
39078 10/06/2008 20:36:57 savegrp RPC error: Connection lost with server 7087 10/06/2008 20:36:57 savegrp Lost channel with the server (nsrjobd)
32490 10/06/2008 20:36:57  savegrp group Live-Misc-Daily  aborted.
39078 10/06/2008 20:36:59 savegrp RPC error: Connection lost with server 7087 10/06/2008 20:36:59 savegrp Lost channel with the server (nsrjobd)
32490 10/06/2008 20:36:59  savegrp group Live-Exchange-Daily aborted.

I have engaged EMC on the matter, but so far they havent been able to pin point the cause of the problem.

So far I have done the following:

stopped NW services and renamed the /nsr/tmp and /nsr/res/jobsdb directories.
looked through DNS and /etc/hosts for erraneous entries
Disabled strong authentication
Reduced savegroup parallelism (in case server was overworked)
Reduced server parallelism (in case server was overworked)

Also it is worth knowing that Antivirus does not run at that time.
All NIC and switch ports are set to 1GB full duplex.
All NIC and Teaming software is running on the latest HP drivers

I feel like I have exhausted all avenues.

Does anyone have any ideas or is suffering from the same issues?

Do you still see the same problem with nsrjobd after you made the changes you mentioned? If so, you might have to do some TCP/IP tuning. I am not that skilled with Windows, but I had a similar problem with Solaris 10 and NetWorker 7.4 and I saw much improvement by making certain adjustments to some of my server's TCP/IP parameters. You might try asking EMC for recommendations or perhaps someone else on this list can provide recommendations for TCP/IP tuning in a Windows environment.

Also, what kind of server are you using for NetWorker (processor speed, amount of memory, disk space)? Does this problem happen frequently? What is the load like when this happens?

To sign off this list, send email to listserv AT listserv.temple DOT edu and type 
"signoff networker" in the body of the email. Please write to networker-request 
AT listserv.temple DOT edu if you have any problems with this list. You can access the 
archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

<Prev in Thread] Current Thread [Next in Thread>