Veritas-bu

[Veritas-bu] bpbkar processes hung on CLOSE_WAIT on Linux

2005-07-07 16:27:44
Subject: [Veritas-bu] bpbkar processes hung on CLOSE_WAIT on Linux
From: jlightner AT water DOT com (Jeff Lightner)
Date: Thu, 7 Jul 2005 16:27:44 -0400
This is a multi-part message in MIME format.

------_=_NextPart_001_01C58332.54AE7196
Content-Type: text/plain;
        charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

All,

=20

Can anyone tell me how I can get rid of Netbackup processes hung with
CLOSE_WAIT status (other than reboot)?

=20

Alternatively can anyone provide definitive information that would
indicate this is a known issue in the Linux / Netbackup combo we're
running that would require a reboot?  If so is it fixed in NB 5.1?

=20

DETAILS:

After looking at Linux forums, Veritas' web site, this forum and Google
I'm not finding a clear answer.  =20

=20

We have a Dell PowerEdge 2850 running Netbackup 4.5 FP6 Client software
under Redhat Linux EL AS 3 (2.4 kernel). =20

The master server is HP-UX 11.11 running same version of Netbackup.

=20

The backups for this have been failing recently (they worked previously)
giving a 41 Network Connection timed out error.  =20

=20

On researching I found multiple bpbkar processes hung.  They can not be
killed with ANY signal (-9, -1, -15 etc... and yes I know the names
SIGHUP, SIGTERM etc...).

=20

lsof reveals all the sockets are in CLOSE_WAIT.  They all show the
master server as the other side but on looking at the master the socket
does not exist any longer.

=20

The CLOSE_WAIT means the other side has closed.   One would expect these
to go away eventually but I have some that are more than a day old.

=20

There was some discussion of a CLOSE_WAIT bug in a version of xinetd
older than the one we're running.  Since it is older and I don't see any
sign this is occurring in other applications it doesn't seem likely this
is the issue.

=20

Also I found discussion of sysctl from 2002 that talks about netfilter
and having a parameter for tcp_ct_close_wait_timeout but nothing newer
than that so I'm not sure it is still relevant.   There is no such
parameter on my system and I'm not keen on trying netfilter just to get
this unless someone has done it more recently.    (I do have iptables
installed.)

=20

=20

Jeffrey C. Lightner

Unix Systems Administrator

DS Waters of North America

678-486-3516

=20


------_=_NextPart_001_01C58332.54AE7196
Content-Type: text/html;
        charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<html xmlns:o=3D"urn:schemas-microsoft-com:office:office" =
xmlns:w=3D"urn:schemas-microsoft-com:office:word" =
xmlns:st1=3D"urn:schemas-microsoft-com:office:smarttags" =
xmlns=3D"http://www.w3.org/TR/REC-html40";>

<head>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Dus-ascii">
<meta name=3DGenerator content=3D"Microsoft Word 11 (filtered medium)">
<o:SmartTagType =
namespaceuri=3D"urn:schemas-microsoft-com:office:smarttags"
 name=3D"place"/>
<!--[if !mso]>
<style>
st1\:*{behavior:url(#default#ieooui) }
</style>
<![endif]-->
<style>
<!--
 /* Style Definitions */
 p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman";}
a:link, span.MsoHyperlink
        {color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {color:purple;
        text-decoration:underline;}
p.MsoAutoSig, li.MsoAutoSig, div.MsoAutoSig
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman";}
span.EmailStyle17
        {mso-style-type:personal-compose;
        font-family:Arial;
        color:windowtext;}
@page Section1
        {size:8.5in 11.0in;
        margin:1.0in 1.25in 1.0in 1.25in;}
div.Section1
        {page:Section1;}
-->
</style>

</head>

<body lang=3DEN-US link=3Dblue vlink=3Dpurple>

<div class=3DSection1>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>All,<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'><o:p>&nbsp;</o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>Can anyone tell me how I can get rid of Netbackup =
processes
hung with CLOSE_WAIT status (other than =
reboot)?<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'><o:p>&nbsp;</o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>Alternatively can anyone provide definitive =
information that
would indicate this is a known issue in the Linux / Netbackup combo =
we&#8217;re
running that would require a reboot?&nbsp; If so is it fixed in NB =
5.1?<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'><o:p>&nbsp;</o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>DETAILS:<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>After looking at Linux forums, Veritas&#8217; web =
site, this
forum and Google I&#8217;m not finding a clear answer.&nbsp;&nbsp; =
<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'><o:p>&nbsp;</o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>We have a Dell PowerEdge 2850 running Netbackup 4.5 =
FP6
Client software under Redhat Linux EL AS 3 (2.4 kernel).&nbsp; =
<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>The master server is HP-UX 11.11 running same version =
of
Netbackup.<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'><o:p>&nbsp;</o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>The backups for this have been failing recently (they =
worked
previously) giving a 41 Network Connection timed out error.&nbsp;&nbsp; =
<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'><o:p>&nbsp;</o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>On researching I found multiple bpbkar processes =
hung.&nbsp; They
can not be killed with ANY signal (-9, -1, -15 etc&#8230; and yes I know =
the
names SIGHUP, SIGTERM etc&#8230;).<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'><o:p>&nbsp;</o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>lsof reveals all the sockets are in CLOSE_WAIT.&nbsp; =
They all
show the master server as the other side but on looking at the master =
the
socket does not exist any longer.<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'><o:p>&nbsp;</o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>The CLOSE_WAIT means the other side has =
closed.&nbsp;&nbsp; One would
expect these to go away eventually but I have some that are more than a =
day
old.<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'><o:p>&nbsp;</o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>There was some discussion of a CLOSE_WAIT bug in a =
version
of xinetd older than the one we&#8217;re running.&nbsp; Since it is =
older and I don&#8217;t
see any sign this is occurring in other applications it doesn&#8217;t =
seem likely
this is the issue.<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'><o:p>&nbsp;</o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>Also I found discussion of sysctl from 2002 that =
talks about
netfilter and having a parameter for tcp_ct_close_wait_timeout but =
nothing
newer than that so I&#8217;m not sure it is still relevant.&nbsp;&nbsp; =
There is no such
parameter on my system and I&#8217;m not keen on trying netfilter just =
to get
this unless someone has done it more recently.&nbsp; &nbsp;&nbsp;(I do =
have iptables
installed.)<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'><o:p>&nbsp;</o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'><o:p>&nbsp;</o:p></span></font></p>

<p class=3DMsoAutoSig><font size=3D3 face=3D"Times New Roman"><span =
style=3D'font-size:
12.0pt'>Jeffrey C. Lightner<o:p></o:p></span></font></p>

<p class=3DMsoAutoSig><font size=3D3 face=3D"Times New Roman"><span =
style=3D'font-size:
12.0pt'>Unix Systems Administrator<o:p></o:p></span></font></p>

<p class=3DMsoAutoSig><font size=3D3 face=3D"Times New Roman"><span =
style=3D'font-size:
12.0pt'>DS Waters of <st1:place w:st=3D"on">North =
America</st1:place><o:p></o:p></span></font></p>

<p class=3DMsoAutoSig><font size=3D3 face=3D"Times New Roman"><span =
style=3D'font-size:
12.0pt'>678-486-3516<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
style=3D'font-size:
12.0pt'><o:p>&nbsp;</o:p></span></font></p>

</div>

</body>

</html>

------_=_NextPart_001_01C58332.54AE7196--

<Prev in Thread] Current Thread [Next in Thread>