Amanda-Users

Re: 2.4.2p2 client, 2.4.4p3 server: timeout from amandad...

2006-05-03 08:51:59
Subject: Re: 2.4.2p2 client, 2.4.4p3 server: timeout from amandad...
From: Paul Bijnens <paul.bijnens AT xplanation DOT com>
To: Francis Galiegue <fg AT one2team DOT com>
Date: Wed, 03 May 2006 14:48:11 +0200
On 2006-05-03 13:26, Francis Galiegue wrote:
The list of filesystems represent 24 Gb total (compressed with gzip). The problem is this: it works fine when I try and backup every directory but one of the two largest (which are resp. 8.4 Gb and 10 Gb uncompressed on disk), and fails when I try to include either of these because _amandad_, not amdump, times out. I get this in the amandad logfile:

--------------------
amandad: debug 1 pid 6636 ruid 33 euid 33 start time Wed May  3 12:15:04 2006
amandad: version 2.4.2p2
[...]
amandad: waiting for ack: timeout, retrying
amandad: dgram_recv: timeout after 10 seconds
amandad: waiting for ack: timeout, giving up!
amandad: pid 6636 finish time Wed May  3 12:20:06 2006
--------------------

Reproducible at will: amandad always times out after 5 minutes. Meanwhile, amdump stays there waiting for... Well, I don't know, frankly, but I have to C-c it and amcleanup afterwards.

What I've already done is increase the etimeout parameter on the server side: I put 1200 instead of the default value, 300. But that didn't help. Out of despair I even tried and changed this value in the old server config files, in case amandad would try and read them :p But no.


You could run tcpdump or ethereal on the server and client and verify
if indeed the packet is arriving there and with the correct IP-number
(considering the aliases for eth0 can have messed that up).

Is there a firewall inbetween, or on one/both of the servers?
You may need to increase the UDP-reply timeout on the firewall (or
disable the firewall).  I believe many firewalls timeout UDP packets
after 180 seconds.

There are other possibilities, solutions. See:

http://wiki.zmanda.com/index.php/Amdump:_results_missing

in amanda 2.4.2p2 the "calcsize" was not yet implemented (it did
exist, but was experimental, I believe).
if the server is 2.4.4xx then you can use "estimate server", even
if the client is old.



It should also be noted that the client machine is such a mess that my predecessor of a sysadmin created 6 aliases for interface eth0... I had to bind amandad specifically to the address I wanted so that dumps could work in the first place. But I don't see this having an influence here, since smaller backups work perfectly...

I'd appreciate any hint on this one!





--
Paul Bijnens, xplanation Technology Services        Tel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUM    Fax  +32 16 397.512
http://www.xplanation.com/          email:  Paul.Bijnens AT xplanation DOT com
***********************************************************************
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, ^^, *
* F6, quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* init 0, kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... *
* ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out          *
***********************************************************************