Networker

Re: [Networker] The nsrd process stopped responding

2005-01-17 11:33:31
Subject: Re: [Networker] The nsrd process stopped responding
From: "Maarten Boot (CWEU-USERS/CWNL)" <Maarten.Boot AT NL.COMPUWARE DOT COM>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Mon, 17 Jan 2005 17:32:41 +0100
Today at 12:10 nsrd died with a coredump

at a strlen (have a trace )

t@1 (l@1) program terminated by signal SEGV (no mapping at the fault address)
0xffffffff7e03d1ec: strlen+0x007c:      ld      [%o1], %o2
(dbx) where                                                                  
current thread: t@1
=>[1] strlen(0x0, 0x0, 0x0, 0x7efefeff, 0x81010100, 0x1007815be), at 
0xffffffff7e03d1ec
  [2] _doprnt(0x0, 0xffffffff7ffe1fa0, 0x0, 0x0, 0x73, 0x0), at 
0xffffffff7e08fd34
  [3] vsprintf(0x1007815a0, 0x10014dee0, 0xffffffff7ffe2178, 0x1, 0x30, 
0x1000d20b8), at 0xffffffff7e091ed0
  [4] err_setstr(0x0, 0x1389, 0x10014dee0, 0x1003e6e28, 0x0, 0x1003f6c70), at 
0x10012789c
  [5] build_clone_rlist(0x304, 0xffffffff7ffe2388, 0xffffffff7ffe2430, 
0xffffffff7ffe2448, 0x0, 0x1006d7c70), at 0x100040f5c
  [6] 0x1000320a8(0x4, 0x100b9aa80, 0xffffffff7ffe2750, 0x0, 0x1, 0x2880), at 
0x1000320a7
  [7] 0x1000377e4(0xffffffff7ffe2530, 0x100b9aa80, 0xffffffff7ffe2710, 0x2, 
0x64, 0x0), at 0x1000377e3
  [8] svcrm_broker_2(0x4, 0xffffffff7ffe2750, 0x10058de20, 0xffffffff7ffe2710, 
0x12c8, 0x12b8), at 0x100033844
  [9] svcnsr_start_pools_2(0xffffffff7ffe5640, 0x1009e96e0, 
0xffffffff7ffed750, 0x100299020, 0x2c00, 0xffffffff7ffe2750), at 0x10007a278
  [10] nsrprog_2(0xffffffff7ffed750, 0x100c75420, 0x0, 0x5f3d7, 0x800000000, 
0xffffffff7ffed850), at 0x1000a2060
  [11] 0x1001186b8(0x100c75420, 0xffffffff7fffd820, 0x0, 0x0, 
0xffffffff7ffff9d0, 0x0), at 0x1001186b7
  [12] svc_getreqset_varped(0xffffffff7fffd9d0, 0x100591d50, 0x8291400, 
0x2008, 0x0, 0xb), at 0x100118858
  [13] 0x100086604(0x78b0, 0x1, 0xffffffff7ffff9e0, 0xffffffffffffdfc0, 
0xffffffff7fffd9c0, 0x2000), at 0x100086603
  [14] 0x1000853e8(0x0, 0x1, 0xa, 0x0, 0x7850, 0x1002b7200), at 0x1000853e7
  [15] main(0x1, 0xffffffff7ffffcf8, 0xffffffff7ffffd08, 0x1002991a0, 
0x1003e82c0, 0x4), at 0x100084950

I expect to upgrade to 7.1.3 soon so opened no case ( I expect legato to ask 
for this upgrade anyway when asking for a support call, it was always their 
standard reaction )

Maarten


On Monday 17 January 2005 17:09, Stan Horwitz wrote:
> I just opened up case 3130419 with LEGATO tech support about this issue,
> but I am wondering if anyone on this list run into a situation where your
> server's nsrd process just dies? I had this happen around 12:20 with a RPC
> error. A bunch of processes were killed off according to the daemon.log
> file, then logging stopped about ten minutes later. Other nsr processes
> such as nsrexecd, nsrmmdb, etc. were still in the process list. The nsrd
> process also generated a core file in /nsr/cores/nsrd at the time it died.
> I shut down the nsr processes via nsr_shutdown and the shutdown appeared
> to happen normally. Then I restarted, and the restart appeared to be
> normal. Backups that were in progress resumed, although a few did crash
> when this problem began so those were not re-started.
>
> Our NetWorker complex consists of a Solaris 9 server with NetWorker Power
> Edition 7.1.2 for our server. We also have one storage node that also has
> Solaris 9 and NetWorker 7.1.2. A variety of clients are backed up each
> night, including Tru64 Unix, Windows, Solaris, Novell, Linux, and NDMP. A
> few of our Windows clients use the MS SQL module and one uses the MS
> Exchange module. Our NetWorker server has a Sony PetaSite library with
> five Sony S-AIT drives connected to it via fiber. Our storage node has a
> Qualstar library with twelve Sony AIT-2 drives connected via LVD SCSI. We
> do not do any library or drive sharing.
>
> The errors in daemon.log look like:
>
> 01/17/05 00:29:05 savegrp: RPC error: Unable to send
> 01/17/05 00:29:05 savegrp: Cannot query the pool resources.  Unable to
> verify the save sets on the media. 01/17/05 00:29:05 nsrexecd: Recvd signal
> to kill process group - pid=-13435, sig=2
>
> --
> Note: To sign off this list, send a "signoff networker" command via email
> to listserv AT listserv.temple DOT edu or visit the list's Web site at
> http://listserv.temple.edu/archives/networker.html where you can
> also view and post messages to the list. Questions regarding this list
> should be sent to stan AT temple DOT edu
> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

-- 
Maarten Boot, 
Compuware Europe B.V.
Hoogoorddreef 5
1101 BA Amsterdam

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listserv.temple DOT edu or visit the list's Web site at
http://listserv.temple.edu/archives/networker.html where you can
also view and post messages to the list. Questions regarding this list
should be sent to stan AT temple DOT edu
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

<Prev in Thread] Current Thread [Next in Thread>