Amanda-Users

Re: amdump/amcheck timeouts

2003-08-04 04:34:26
Subject: Re: amdump/amcheck timeouts
From: "Martin" <ammail AT sebastian DOT nl>
To: <amanda-users AT amanda DOT org>
Date: Mon, 4 Aug 2003 10:30:11 +0200


>
> > I have only started using Amanda a couple of days ago to backup some
> clients
> > in our local network. Everything seemed to be working fine. However,
> > occasionally (about 50% of the times) amcheck times out ("selfcheck
> request
> > timed out. Host down?"). When I check /var/log/messages on the client,
> there
> > are a lot of "amandad[31356]: error receiving message: timeout"
messages.
> > Amanda debug files show the client timing out on waiting for an ACK from
> the
> > server. It seems that amcheck is more likely to time out than amdump;
> amdump
> > only times out about 25% of the times.
> > Something else that I think is a bit strange, when I run amcheck again
> > immediately after it finishes, it usually fails ("host down?"). When I
> wait
> > about a minute or so, and I run it again, it succeeds.
>
> Typical for a routing problem or a wrong subnetmask in one of the
> systems involved, and a router that issues a "icmp redirect", to solve
> the issue temporarily.
>
> Try ping from both sides and see if you notice something strange.
>
>

The ping works fine from both sides. All other services on both computers
are up and running just fine.
Both computers are on same network, configured with same netmask and
broadcast address.
When I run amcheck and it fails, the server debug files show no error
messages.
The client debug files show the following error messages:
.....
amandad: time 29.991: dgram_recv: timeout after 30 seconds
amandad: error receiving message: timeout
amandad: time 29.991: error receiving message: timeout
...........
amandad: time 34.060: weird, it is not a proper ack
  addr: peer xxx.yyy.zzz.aaa dup xxx.yyy.zzz.aaa, port: peer 888 dup 919
amandad: time 44.053: dgram_recv: timeout after 10 seconds
amandad: time 44.053: waiting for ack: timeout, retrying
amandad: time 54.053: dgram_recv: timeout after 10 seconds
amandad: time 54.053: waiting for ack: timeout, retrying
amandad: time 64.053: dgram_recv: timeout after 10 seconds
amandad: time 64.053: waiting for ack: timeout, giving up!
.....

Anyone got a clue?
Thanks,

Martin



<Prev in Thread] Current Thread [Next in Thread>