Amanda-Users

selfcheck timeout after a week of troubleshooting, on just one client out of many

2003-07-28 18:01:16
Subject: selfcheck timeout after a week of troubleshooting, on just one client out of many
From: "Martin, Jeremy" <jmartin AT gsi-kc DOT com>
To: <amanda-users AT amanda DOT org>
Date: Mon, 28 Jul 2003 16:58:29 -0500
Hi,

My amanda server is having problems connecting to a client. It's working great 
with a number of other clients, but this one client keeps trying to frustrate 
me.

I have developed a little step by step procedure for installing amanda clients 
on my servers so I can quickly copy/paste most things to avoid mistake. Let me 
preface this by saying I used the same self-made guide for setting up a bunch 
of other clients, as well as this problem client... and the other ones are 
working great (one RedHat 7.1 servers, multiple RedHat 9 servers, and a few 
RedHat 7.3 servers as well).

There are 2 firewalls inbetween the client and the server. The client (a RedHat 
7.3 box) was running iptables, but since we have it behind a hardware firewall 
I've even tried doing "iptables --flush" and the problem remains with iptables 
wide open (the default is set to ACCEPT). 

In the server's firewall, I've turned on full logging and doing see anything 
being logged at all. If I try a client-side-restore (even though the client has 
never been backed up before), it works ok and lets me connect to the server, 
and I see appropriate entries in the server's firewall's logs. 

However every time I try doing amcheck, the client's self check times out. 

Here's what I see in the client's firewall's logs when I attempt to amcheck it. 
(the public IP's have been changed)

Date/Time: 2003-07-28 15:26:16 
Source Address/Port: 1.2.3.4:694 
Translated Address/Port: 1.2.3.4:694 
Destination Address/Port: 192.168.250.120:10080
Service: UDP PORT 10080 
Duration: 3 sec.
Bytes Sent: 163 
Bytes Received: 191 

Every time it's the same.. 163 bytes sent, 191 bytes received, but still the 
amcheck says "host down?"

Here is the ./configure I'm using on the clients: ./configure 
--with-user=amanda --with-group=amanda --with-amandahosts --prefix=/usr/local 
--with-config=Daily --without-server --with-tcpportrange=44200,44209 
--with-udpportrange=790,799

On the client, /etc/hosts has an entry for the server, "theisland"... 
/etc/hosts.allow is set up properly (amanda : theisland : ALLOW), /etc/services 
contains all of my custom UDP and TCP ports as well as the defaults of 10080 
(tcp/udp), 10082 and 10083 (tcp)... 

This is my xinetd.conf entry for amanda:
service amanda 
{
        socket_type = dgram
        protocol = udp
        user = amanda
        server = /usr/local/libexec/amandad
        wait = yes
}

(I've heard some people suggest putting "disable = no" in there but when I do, 
the system complains that it's not a valid entry, so I've removed it)

lsof shows that amanda is listening... though I don't recognize the numbers, it 
shouldn't really matter since the firewalls are currently set up to allow all 
traffic between the amanda server and this client.

   xinetd     876 root    5u  IPv4       1393              UDP *:amanda

I can run amandad as the amanda user and it doesn't give me any errors. When I 
run it, it does create a /tmp/amanda/amandad.*.debug file... However when I try 
to run amcheck on the server, it doesn't ever create any /tmp/amanda/ debug 
files. amanda:amanda does own /tmp/amanda and has permission to create files 
there.. 

I did "service xinetd reload" "service xinetd restart".. as well as rebooted 
the entire machine.. 

No error messages in /var/log/messages or /var/log/secure that I can see. 

/home/amanda/.amandahosts is set up... "theisland amanda" 

/home/amanda's permissions are correct.. as are the permissions in 
/usr/local/var/amanda 

On the server I do see a lot of these entries in /var/log/messages:

Jul 28 05:47:51 theisland kernel: (see the NOTES section of 'man 2 wait'). 
Workaround activated.
Jul 28 05:55:16 theisland kernel: application bug: dumper(20085) has SIGCHLD 
set to SIG_IGN but calls wait().
Jul 28 05:55:16 theisland kernel: (see the NOTES section of 'man 2 wait'). 
Workaround activated.

However I've done some Google searches and these appear to be harmless, plus 
the fact that the other servers are being backed up just fine, so I'm not too 
worried about that. 

I've read through the Amanda FAQ many times, including the multiple entries 
about the "selfcheck timeout, host down?" message, but still haven't had any 
luck.

Just to make sure it wasn't a hardware firewall issue I completely opened up 
traffic between the server and this problematic client on both firewalls (with 
full logging turned on) and haven't noticed anything more in the logs compared 
to when I had it restricted to specific ports (tcp 44200-44300 10080 10082 
10083, and udp 10080 / 790-799).

Any ideas of other things to try? Thanks! 
Jeremy Martin


<Prev in Thread] Current Thread [Next in Thread>