Hi everyone,
I just happened to notice that a selfcheck and an amandad have been running
for a long time (since 14 Mar, today is 1 Apr):
[root ~]% ps aux | grep amanda
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
amanda 8858 0.0 0.0 2068 496 ? S Mar14 0:00 amandad
amanda 8859 0.0 0.0 1816 520 ? S Mar14 0:00
/usr/libexec/self
(1) Can I safely kill these two processes?
(2) Could they be responsible for this intermittent problem:
During my daily amcheck, one of my hosts (not the server, the machine above)
is regularly reported as down or offline. Looking at the logs, I see that
selfcheck.*.debug looks ok, but amandad.*.debug has (at the end):
amandad: dgram_recv: timeout after 10 seconds
amandad: waiting for ack: timeout, retrying
amandad: dgram_recv: timeout after 10 seconds
amandad: waiting for ack: timeout, retrying
amandad: dgram_recv: timeout after 10 seconds
amandad: waiting for ack: timeout, retrying
amandad: dgram_recv: timeout after 10 seconds
amandad: waiting for ack: timeout, retrying
amandad: dgram_recv: timeout after 10 seconds
amandad: waiting for ack: timeout, giving up!
Sometimes, but not always, if I run `/etc/init.d/xinetd restart` and then
re-run amcheck, the problem is solved. If not, then a reboot solves it.
But it's silly to have to reboot a machine once or twice times a week, even
though it isn't doing much at that time of the day. I've already increased
ctimeout to 90.
Any suggestions?
Bart
--
Bart Brashers, Ph.D.
Air Quality Meteorologist
MFG Inc.
19203 36th Ave W Suite 101
Lynnwood WA 98036-5707
bart.brashers AT mfgenv DOT com
Phone: 425.921.4000
Fax: 425.921.4040
|