Networker

Re: [Networker] Too many open files error

2003-01-31 03:09:31
Subject: Re: [Networker] Too many open files error
From: FAIDHERBE Thierry <TFHAIDHE AT MAIL.MOBISTAR DOT BE>
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Date: Fri, 31 Jan 2003 08:50:42 +0100
As promised :

"In certain high capacity environments, there may be instances where null
DNS queries, un-resolvable hostnames, or incorrect name resolution will open
the maximum number of file descriptors. During the backup process, the OS
will assign a file descriptor for each process. Since NetWorker uses RPC for
intercommunications, when the ulimit has been surpassed, NetWorker will
become unresponsive or will start to randomly close file descriptors for new
requests. This behavior will loop until the system is no longer functional,
and in some cases cause nsrd to core.
Another understanding of the problem is that a Windows nwadmin is up and
running somewhere on the same network as NetWorker server; Windows nwadmin
has a left hand window that search for all NetWorker servers in the same
network; however if this windows machine does not have a valid DNS entry, it
will cause nsrd not to be able to response back, hence keeping the file
descriptor opened, with effect similar to SYN flood attack (a form of Denial
of Service (DoS) attack). nsrd will keep opening up new file descriptors,
causing the system to be very slow until one restart NetWorker services
again.

After checking and correcting DNS or hosts files (client, server and SN
side), 
you need to edit the /etc/rc2.d/S95networker and add
"umilit -n 2048" before starting networker processes.
A good idea is also to remove the "&" (background sign)
after the nsr_shutdown commands to avoid CFI and MediaDB
corruptions when using unix based command like 
"/etc/rc2.d/S95networker stop ; /etc/rc2.d/S95networker start"

So, once modified, the file looks now as :

#!/bin/sh
# installed by postinstall on Tue Nov 26 15:06:55 MET 2002
#
# Default locale
#
LANG=C
export LANG
#TFD ADDED 9-DEC-2002
ulimit -n 2048
# Override to a different locale if /usr/lib/nsr/LANG exist
[ -r /usr/lib/nsr/LANG ] && . /usr/lib/nsr/LANG

case $1 in
'start')
(echo 'starting NetWorker daemons:') > /dev/console
if [ -f /usr/sbin/nsrexecd ]; then
        (/usr/sbin/nsrexecd; /bin/sleep 15) > /dev/console 2>&1
        (echo ' nsrexecd') > /dev/console
fi
if [ -f /usr/sbin/lgtolmd ]; then
        (/usr/sbin/lgtolmd -p /nsr/lic -n 1) > /dev/console 2>&1
        (echo ' lgtolmd') > /dev/console
fi
if [ -f /usr/sbin/nsrd -a ! -f /usr/sbin/NetWorker.clustersvr ]; then
        (/usr/sbin/nsrd) > /dev/console 2>&1
        (echo ' nsrd') > /dev/console
fi
        ;;
'stop')
(echo 'stopping NetWorker daemons:') > /dev/console
if [ -f /usr/sbin/nsr_shutdown ]; then
        if [ -f /usr/sbin/NetWorker.clustersvr ]; then
                (/usr/sbin/nsr_shutdown -c -a -q) > /dev/console 2>&1
                (echo ' nsr_shutdown -c -a -q') > /dev/console
        else
                (/usr/sbin/nsr_shutdown -a -q) > /dev/console 2>&1
                (echo ' nsr_shutdown -a -q') > /dev/console
        fi
fi
        ;;
*)
echo "usage: `basename $0` {start|stop}"
        ;;
esac


Kind regards - Bien cordialement - Vriendelijke groeten,

Thierry FAIDHERBE

HPCI - Storage & Server Integration Practice 
Tru64 Unix and Legato EBS Consultant
                                   
 *********       *********   HEWLETT - PACKARD
 *******    h      *******   1 Rue de l'aeronef/Luchtschipstraat
 ******    h        ******   1140 Bruxelles/Brussel/Brussels
 *****    hhhh  pppp *****   
 *****   h  h  p  p  *****   100/102 Blv de la Woluwe/Woluwedal
 *****  h  h  pppp   *****   1200 Bruxelles/Brussel/Brussels
 ******      p      ******   BELGIUM
 *******    p      *******                              
 *********       *********   Phone :    +32 (0)2  / 729.85.42   
                             Mobile :   +32 (0)498/  94.60.85 
                             Fax :      +32 (0)2  / 729.88.30   
     I  N  V  E  N  T        Email :    thierry.faidherbe AT hp DOT com
                             Internet : http://www.hp.com/
________________________________________________________________________

MOBISTAR SA/NV 

SYSTEM Team Charleroi, Mermoz 2 Phone : +32 (0)2  / 745.75.81  
Avenue Jean Mermoz, 32          Fax :   +32 (0)2  / 745.89.56  
6041 GOSSELIES                  Email : tfhaidhe AT mail.mobistar DOT be
BELGIUM                         Web :   http://www.mobistar.be/
________________________________________________________________________

  


-----Original Message-----
From: Faidherbe, Thierry [mailto:Thierry.Faidherbe AT hp DOT com] 
Sent: Thursday, January 30, 2003 7:52 PM
To: Legato NetWorker discussion; VERHAEGHE Koen (BMB);
NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Subject: RE : [Networker] Too many open files error


Dear Koen,
 
I also met same problems. Most of the time, it occurs during 
unsollicited nsrck -F on CFI's.
 
I met the problem on Solaris 8 and the solution was to increase the amount
of opened files using an environmental variable before the nsrd start-up.
 
The problem is off course an unresolvable IP or hostname that loops and
consume
sokets, resulting in lack of file descriptors and causing troubles.
 
Drop me a call on tomorrow, I will give you the exact variable name. For the
other, I will post it on tomorrow.
 
HTH and warm regards,
 
Thierry
 
Kind regards - Bien cordialement - Vriendelijke groeten,

Thierry FAIDHERBE

HPCI - Storage & Server Integration Practice
Tru64 Unix and Legato EBS Consultant
                                  
 *********       *********   HEWLETT - PACKARD
 *******    h      *******   1 Rue de l'aeronef/Luchtschipstraat
 ******    h        ******   1140 Bruxelles/Brussel/Brussels
 *****    hhhh  pppp *****  
 *****   h  h  p  p  *****   100/102 Blv de la Woluwe/Woluwedal
 *****  h  h  pppp   *****   1200 Bruxelles/Brussel/Brussels
 ******      p      ******   BELGIUM
 *******    p      *******                             
 *********       *********   Phone :    +32 (0)2  / 729.85.42  
                             Mobile :   +32 (0)498/  94.60.85
                             Fax :      +32 (0)2  / 729.88.30  
     I  N  V  E  N  T        Email :    thierry.faidherbe AT hp DOT com
                             Internet : http://www.hp.com/
<http://www.hp.com/> 
________________________________________________________________________

MOBISTAR SA/NV

SYSTEM Team Charleroi, Mermoz 2 Phone : +32 (0)2  / 745.75.81 
Avenue Jean Mermoz, 32          Fax :   +32 (0)2  / 745.89.56 
6041 GOSSELIES                  Email : tfhaidhe AT mail.mobistar DOT be
BELGIUM                         Web :   http://www.mobistar.be/
<http://www.mobistar.be/> 


        -------- Message d'origine-------- 
        De: VERHAEGHE Koen (BMB) [mailto:Koen.VERHAEGHE AT PROXIMUS DOT NET] 
        Date: jeu. 30/01/2003 10:59 
        À: NETWORKER AT LISTMAIL.TEMPLE DOT EDU 
        Cc: 
        Objet: [Networker] Too many open files error
        
        

        Dears,
        
        We have 2 Networker servers (prod & non-prod) in 6.1.2
        
        In the last week both have failed 2 times with the following error.
        
        01/21/03 03:07:27 nsrd: media notice: check storage node: edcb560
        (nsrmon timed out)
        svc_tcp: svcfd_create: Too many open files
        svc_tcp: svcfd_create: Too many open files
        svc_tcp: svcfd_create: Too many open files
        svc_tcp: svcfd_create: Too many open files
        
        Nsrd keeps on running, but all backups fails, with thousends of the
        above messages in the daemon.log. There are also a lot of nsrmon
        processes running. I opened a case with Legato, and they asked me to
        check for possible DNS inconsistencies (unresolvable hosts etc...).
        
        While doing that, I just wanted to check if any of had seen this
before
        ?
        
        Cheers
        Koen
        
        
        **** DISCLAIMER ****
        
        "This e-mail and any attachment thereto may contain information
which is confidential and/or protected by intellectual property rights and
are intended for the sole use of the recipient(s) named above.
        Any use of the information contained herein (including, but not
limited to, total or partial reproduction, communication or distribution in
any form) by other persons than the designated recipient(s) is prohibited.
        If you have received this e-mail in error, please notify the sender
either by telephone or by e-mail and delete the material from any computer".
        
        Thank you for your cooperation.
        
        For further information about Proximus mobile phone services please
see our website at http://www.proximus.be or refer to any Proximus agent.
        
        --
        Note: To sign off this list, send a "signoff networker" command via
email
        to listserv AT listmail.temple DOT edu or visit the list's Web site at
        http://listmail.temple.edu/archives/networker.html where you can
        also view and post messages to the list.
        =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
        



*****DISCLAIMER*****

This electronic transmission (and any attached document) is intended 
exclusively for the person or entity to whom it is addressed and may 
contain confidential and/or privileged material. 
Any disclosure, copying, distribution or other action  based upon 
the information by persons or entities other than the intended recipient
is prohibited. If you receive this message in error, please contact the 
sender and delete the material from any and all computers. 
Mobistar does not warrant a proper and complete transmission of this
information, nor does it accept liability for any delays.

*****END OF DISCLAIMER*****

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

<Prev in Thread] Current Thread [Next in Thread>