Amanda-Users

Re: request failed: timeout waiting for ACK

2006-03-10 06:19:22
Subject: Re: request failed: timeout waiting for ACK
From: Paul Bijnens <paul.bijnens AT xplanation DOT com>
To: Stefan Herrmann <magic99de AT web DOT de>
Date: Fri, 10 Mar 2006 12:16:29 +0100
On 2006-03-10 11:19, Stefan Herrmann wrote:
hello list,

didnt find a solution for this problem yet, and need it urgently.

this is a summary of what happened:

system is:
FreeBSD pille.hq.imos.net 5.4-RELEASE-p3 FreeBSD 5.4-RELEASE-p3 #0: Sat Jul 2 16:02:43 CEST 2005 root AT pille.hq.imos DOT net:/usr/obj/usr/src/sys/IMOS i386

amanda versions:
server: 2.5.0b2
client: 2.4.5p1

Only 1 system, but two amanda versions...
I presume the client and server both run the same OS version, but are
different machines.



"amstatus daily" tells the following:

pille.hq.imos.net:/ 0 252m finished (2:08:23) pille.hq.imos.net:/opt 1 driver: (aborted:[request failed: timeout waiting for ACK])(too many dumper retry) pille.hq.imos.net:/usr 0 3505m finished (1:53:08) pille.hq.imos.net:/var 0 driver: (aborted:[request failed: timeout waiting for ACK])(too many dumper retry)

as you can see, parts of the backup are done, others get aborted. reason is that often the client does not answer the request from the amanda server. that is what i can see from a tcpdump output and
from amandad.*.debug on the client:

[...]
amandad: time 30.004: dgram_recv: timeout after 30 seconds
amandad: error receiving message: timeout
amandad: time 30.004: error receiving message: timeout
amandad: time 30.004: pid 64288 finish time Fri Mar 10 02:05:55 2006

You omitted the useful information just above, but from what I can see
is that the client amandad  apparently is getting started by (x)inetd,
but that when it tries to read the packet, there is nothing.

Wild guess...
Is your inetd service for amanda configured to "wait" or "nowait"?
It should be "wait".  (xinetd uses syntax "wait = yes".)

You said you also had tcpdump trace.
Run tcpdump both on server and client, and verify if the client indeed
receives what the server sends.  Any router/firewall between them?



i already installed the fix for freebsd for large udp packets, so that should not be the problem.

Large UDP packets occur during estimate only, but you are already
in the dumping phase.  So that should be completely unrelated.


i dont know what to do further, can anyone help ?

The "too many dumper retry" error is suspicious too.  Are these dumpers
special (e.g. bypassing holdingdisk, search "PORT-WRITE" in the amdump.X
file).  Any other useful info in amdump.X file on the server about this
problem (Out of swapspace on server? etc.)

Are it always the same DLE that are failing?

--
Paul Bijnens, xplanation Technology Services        Tel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUM    Fax  +32 16 397.512
http://www.xplanation.com/          email:  Paul.Bijnens AT xplanation DOT com
***********************************************************************
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, ^^, *
* F6, quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* init 0, kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... *
* ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out          *
***********************************************************************