Amanda-Users

RE: client was working, now suddenly is getting self check "host down?" errors

2003-05-30 15:08:12
Subject: RE: client was working, now suddenly is getting self check "host down?" errors
From: "Ron Bauman" <RBauman AT HatterasNetworks DOT com>
To: "Martin, Jeremy" <jmartin AT gsi-kc DOT com>, <amanda-users AT amanda DOT org>
Date: Fri, 30 May 2003 15:04:48 -0400
I have a random problem like this as well running RH Linux.  The client 
occasionally fails amcheck in the afternoon. (Backups run at nite.)  When I 
look at portland, the client, I find the selfcheck task "stuck" and I am unable 
to kill it, even with kill -9.  See if you have the same problem.  On the 
client, try

ps -ef | grep amand

or grep with whatever your amanda user account is.

If you see selfcheck running, you'll be unable to get amcheck on the server to 
finish until it's gone.  Just something to check.

Ron Bauman
Hatteras Networks, Inc.

-----Original Message-----
From: Martin, Jeremy [mailto:jmartin AT gsi-kc DOT com]
Sent: Friday, May 30, 2003 2:14 PM
To: amanda-users AT amanda DOT org
Subject: client was working, now suddenly is getting self check "host
down?" errors


Hi,

This is confusing me a bit, I hope someone hear has an idea of what might be 
happening.

I have been running an amanda server, backing itself up + one other amanda 
client (jayhawker), for about a week now. It works great every night when I 
have the amdump run. Yesterday I added a third amanda client, "bcc1". bcc1 and 
jayhawker are both fresh RedHat 9 installs.

I configured bcc1 exactly the same was as jayhawker, with the same entries in 
hosts.allow / hosts.deny / xinetd.conf / /home/amanda/.amandahosts etc. Both 
mybox (by name and by ip just in case) and localhost (by localhost / 
localhost.localdomain / 127.0.0.1) are allowed in hosts.allow for the user 
amanda... I know a lot of that is redundant but I wanted to be 100% sure I 
allowed the right things, since at least the .amandahosts file has been a bit 
picky. Also of course my amanda server "mybox" is set up ok in /etc/hosts. 

At first "mybox" could back up bcc1 just fine. I ran amcheck and there were 0 
problems in 3 clients found. The first amdump worked yesterday afternoon. Then 
overnight amdump ran from cron and was unable to connect to bcc1. Actually 95% 
of the DLEs were backed up ok but /usr on bcc1 failed:

  192.168.2. /usr lev 0 FAILED 20030530[could not connect to 192.168.2.200]

This morning after reading that in the amanda report, I ran amcheck and it said 
selfcheck host down when trying bcc1 . Just to see if I could get to it, I 
tried "ping bcc1" which started pinging the right IP immediately, no problems 
at all. I ran amcheck again without changing anything else and it found 0 
problems. Then I ran amdump and somehow by the time it had finished, the 
problem came back, because *all* of the DLEs had FAILED messages saying could 
not connect. I had to leave the building for a bit, and when I came back, 
amcheck repeatedly says host down, even after I ping bcc1 (which still works 
great). I checked the /var/log/secure and /var/log/messages but I don't see 
anything strange at all, as far as I can tell. the amanda service is still 
running on the client and nothing has changed in the firewalls etc. I double 
checked all the things mentioned in the FAQ but everything seems to still be 
set up just fine. 

My disklist file on the server uses "bcc1" for the client name, but just for 
kicks I tried changing it to the client's IP, and now it's saying that ip won't 
do a selfcheck either. Also my timeouts were set to at least 30 seconds in 
amanda.conf, amcheck was waiting a good long while before giving up, plus the 
boxes are all on a LAN so when amcheck works it usually only takes it less than 
a second to finish. 

Any ideas of why a client would work for a while then randomly not be able to 
do a selfchecK? The other amanda client is still working great...

Thanks!
_______________________
Jeremy Martin
Network Technician
http://www.gsi-kc.com
mailto:jmartin AT gsi-kc DOT com





<Prev in Thread] Current Thread [Next in Thread>