Networker

[Networker] rlim_fd_cur on Solaris and nsrexecd

2003-10-13 18:28:31
Subject: [Networker] rlim_fd_cur on Solaris and nsrexecd
From: Craig Ruefenacht <Craig.Ruefenacht AT US.USANA DOT COM>
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Date: Mon, 13 Oct 2003 16:24:54 -0600
Hi,

A couple of days ago I installed a Networker 6.1.3 client on a Solaris 8
machine that had previously had Networker installed, but under a different
hostname and when the machine was performing a slightly different function.

To make a long story short, the nsrexecd daemon would core dump when it
attempted to start.  I spent a bit poking around and looking on other
similar Solaris 8 machines, and found the culprit.  After finding and
implementing a work-around, I did a search of the Networker archives and
didn't find any mention about our particular issue, thus, this email to add
it to the archives for future reference.

In a truss output, the following was the last thing that got recorded (and
shows nsrexecd crashing):

7612:   getuid()                                        = 0 [0]
7612:   sysconfig(_CONFIG_OPEN_FILES)                   = 32768
7612:       Incurred fault #6, FLTBOUNDS  %pc = 0x000B4998
7612:         siginfo: SIGSEGV SEGV_MAPERR addr=0xFFBF09DC
7612:       Received signal #11, SIGSEGV [default]
7612:         siginfo: SIGSEGV SEGV_MAPERR addr=0xFFBF09DC
7612:           *** process killed ***

The rlim_fd_cur kernel parameter on this system is set to 32768 (via
/etc/system), which is where the sysconfig(_CONFIG_OPEN_FILES) (in the truss
output) is obtaining the 32768 number.  The application being used on this
particular machine (an Oracle portal type product) recommended this value
for rlim_fd_cur due to the number of files it opens, et al.

On our other Solaris 8 boxes in which the Networker 6.1.3 client was working
fine, the rlim_fd_cur kernel parameter was set to 1024, and a truss on the
working system showed that the sysconfig(_CONFIG_OPEN_FILES) value was set
to 1024, and the next thing that occured was nsrexecd did a close on all
1024 file descriptors.

By doing a "ulimit -n 1024" in the Networker startup script on the Solaris 8
machine where nsrexecd was coredumping, it fixed the problem.  I'm guessing
that the code that does a close on all the file descriptors is not written
to handle a large of number as 32768.

I don't know if this is necessarily a bug in Networker code.  I don't know
of many applications that would ever require a rlim_fd_cur that high, and
I'm not sure at what value the nsrexecd daemon will start core dumping, but
obviously it is somewhere between 1024 and 32768.

--
Craig Ruefenacht
UNIX Administrator
USANA Health Sciences
http://www.usanahealthsciences.com




--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

<Prev in Thread] Current Thread [Next in Thread>