I have a random problem like this as well running RH Linux. The client
occasionally fails amcheck in the afternoon. (Backups run at nite.) When I
look at portland, the client, I find the selfcheck task "stuck" and I am unable
to kill it, even with kill -9. See if you have the same problem. On the
client, try
ps -ef | grep amand
or grep with whatever your amanda user account is.
If you see selfcheck running, you'll be unable to get amcheck on the server to
finish until it's gone. Just something to check.
Ron Bauman
Hatteras Networks, Inc.
-----Original Message-----
From: Martin, Jeremy [mailto:jmartin AT gsi-kc DOT com]
Sent: Friday, May 30, 2003 2:14 PM
To: amanda-users AT amanda DOT org
Subject: client was working, now suddenly is getting self check "host
down?" errors
Hi,
This is confusing me a bit, I hope someone hear has an idea of what might be
happening.
I have been running an amanda server, backing itself up + one other amanda
client (jayhawker), for about a week now. It works great every night when I
have the amdump run. Yesterday I added a third amanda client, "bcc1". bcc1 and
jayhawker are both fresh RedHat 9 installs.
I configured bcc1 exactly the same was as jayhawker, with the same entries in
hosts.allow / hosts.deny / xinetd.conf / /home/amanda/.amandahosts etc. Both
mybox (by name and by ip just in case) and localhost (by localhost /
localhost.localdomain / 127.0.0.1) are allowed in hosts.allow for the user
amanda... I know a lot of that is redundant but I wanted to be 100% sure I
allowed the right things, since at least the .amandahosts file has been a bit
picky. Also of course my amanda server "mybox" is set up ok in /etc/hosts.
At first "mybox" could back up bcc1 just fine. I ran amcheck and there were 0
problems in 3 clients found. The first amdump worked yesterday afternoon. Then
overnight amdump ran from cron and was unable to connect to bcc1. Actually 95%
of the DLEs were backed up ok but /usr on bcc1 failed:
192.168.2. /usr lev 0 FAILED 20030530[could not connect to 192.168.2.200]
This morning after reading that in the amanda report, I ran amcheck and it said
selfcheck host down when trying bcc1 . Just to see if I could get to it, I
tried "ping bcc1" which started pinging the right IP immediately, no problems
at all. I ran amcheck again without changing anything else and it found 0
problems. Then I ran amdump and somehow by the time it had finished, the
problem came back, because *all* of the DLEs had FAILED messages saying could
not connect. I had to leave the building for a bit, and when I came back,
amcheck repeatedly says host down, even after I ping bcc1 (which still works
great). I checked the /var/log/secure and /var/log/messages but I don't see
anything strange at all, as far as I can tell. the amanda service is still
running on the client and nothing has changed in the firewalls etc. I double
checked all the things mentioned in the FAQ but everything seems to still be
set up just fine.
My disklist file on the server uses "bcc1" for the client name, but just for
kicks I tried changing it to the client's IP, and now it's saying that ip won't
do a selfcheck either. Also my timeouts were set to at least 30 seconds in
amanda.conf, amcheck was waiting a good long while before giving up, plus the
boxes are all on a LAN so when amcheck works it usually only takes it less than
a second to finish.
Any ideas of why a client would work for a while then randomly not be able to
do a selfchecK? The other amanda client is still working great...
Thanks!
_______________________
Jeremy Martin
Network Technician
http://www.gsi-kc.com
mailto:jmartin AT gsi-kc DOT com
|