Amanda-Users

Zombie selfcheck?

2003-04-01 13:26:25
Subject: Zombie selfcheck?
From: "Brashers, Bart -- MFG, Inc." <Bart.Brashers AT mfgenv DOT com>
To: "Amanda Users (E-mail)" <amanda-users AT amanda DOT org>
Date: Tue, 1 Apr 2003 09:57:02 -0700
Hi everyone,

I just happened to notice that a selfcheck and an amandad have been running
for a long time (since 14 Mar, today is 1 Apr):

[root ~]% ps aux | grep amanda
USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND
amanda    8858  0.0  0.0  2068  496 ?        S    Mar14   0:00 amandad
amanda    8859  0.0  0.0  1816  520 ?        S    Mar14   0:00
/usr/libexec/self

(1) Can I safely kill these two processes?

(2) Could they be responsible for this intermittent problem:

During my daily amcheck, one of my hosts (not the server, the machine above)
is regularly reported as down or offline.  Looking at the logs, I see that
selfcheck.*.debug looks ok, but amandad.*.debug has (at the end):

amandad: dgram_recv: timeout after 10 seconds
amandad: waiting for ack: timeout, retrying
amandad: dgram_recv: timeout after 10 seconds
amandad: waiting for ack: timeout, retrying
amandad: dgram_recv: timeout after 10 seconds
amandad: waiting for ack: timeout, retrying
amandad: dgram_recv: timeout after 10 seconds
amandad: waiting for ack: timeout, retrying
amandad: dgram_recv: timeout after 10 seconds
amandad: waiting for ack: timeout, giving up!

Sometimes, but not always, if I run `/etc/init.d/xinetd restart` and then
re-run amcheck, the problem is solved.  If not, then a reboot solves it.
But it's silly to have to reboot a machine once or twice times a week, even
though it isn't doing much at that time of the day.  I've already increased
ctimeout to 90.

Any suggestions?  

Bart
--
Bart Brashers, Ph.D.
Air Quality Meteorologist
MFG Inc.
19203 36th Ave W Suite 101
Lynnwood WA 98036-5707

bart.brashers AT mfgenv DOT com
Phone: 425.921.4000
Fax:   425.921.4040

<Prev in Thread] Current Thread [Next in Thread>
  • Zombie selfcheck?, Brashers, Bart -- MFG, Inc. <=