Amanda-Users

Re: Estimate timeout error

2004-12-03 12:28:43
Subject: Re: Estimate timeout error
From: Paul Bijnens <paul.bijnens AT xplanation DOT com>
To: Nick Danger <nick AT hackermonkey DOT com>
Date: Fri, 03 Dec 2004 18:20:46 +0100
Nick Danger wrote:
There is a PIX between the two, but Im backing up a bunch (10?) linux and solaris servers in the same areas of the network, to this same amanda server without any issues so I dont believe it to be a firewall issue. There are no iptables running on either host (both linux in this case)

In the amandad.XXX.debug log I have the following lines, which Im assuming are the error report of the problem? Now, the question is, how to fix it :-)

-Nick


amandad: time 0.010: amandahosts security check passed
amandad: time 0.010: running service "/usr/lib/amanda/sendsize"
amandad: time 182.436: sending REP packet:

The above concludes that 3 minutes is needed for the sendsize,
and it is indeed without errors, because it has all the
info below.  Could still be that 179 seconds works and 181 seconds
is too late...


----
Amanda 2.4 REP HANDLE 005-40813308 SEQ 1102082216
OPTIONS features=fffffeff9ffe0f;
/ 0 SIZE 301197
/ 1 SIZE 100
/u00 0 SIZE 143930
/u00 1 SIZE 41411
/usr 0 SIZE 880958
/usr 1 SIZE 79
/usr/local 0 SIZE 174
/usr/local 1 SIZE 47
/var 0 SIZE 299300
/var 1 SIZE 2857
----

The above lines are the reply packet, less than 300 bytes,
so I guess it's not a UDP packet overflow.



amandad: time 192.437: dgram_recv: timeout after 10 seconds
amandad: time 192.437: waiting for ack: timeout, retrying
amandad: time 202.439: dgram_recv: timeout after 10 seconds
amandad: time 202.439: waiting for ack: timeout, retrying
amandad: time 212.441: dgram_recv: timeout after 10 seconds
amandad: time 212.442: waiting for ack: timeout, retrying
amandad: time 222.444: dgram_recv: timeout after 10 seconds
amandad: time 222.444: waiting for ack: timeout, retrying
amandad: time 232.446: dgram_recv: timeout after 10 seconds
amandad: time 232.446: waiting for ack: timeout, giving up!
amandad: time 232.446: pid 21896 finish time Fri Dec  3 09:01:32 2004


But the reply packet never got acknowledged by the server.
Somehow it got lost or corrupted.
Default route for reverse path not correct?  Wrong subnetmask?
Try do get a network trace at the client and server, and inbetween
(don't know how to accomplish that on a PIX firewall):

Solaris:
    snoop -x42 host x.x.x.x proto udp port 10080
using open source (linux and others):
    tcpdump -X host x.x.x.x and udp and port 10080
Or other programs that have the same capabilities (ethereal etc).

Before guessing how to fix it, we must know where the problem is.
Is the packet lost?  or is it broken?

--
Paul Bijnens, Xplanation                            Tel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUM    Fax  +32 16 397.512
http://www.xplanation.com/          email:  Paul.Bijnens AT xplanation DOT com
***********************************************************************
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit,  ZZ, :q, :q!,  M-Z, ^X^C,  logoff, logout, close, bye,  /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* kill -9 1,  Alt-F4,  Ctrl-Alt-Del,  AltGr-NumLock,  Stop-A,  ...    *
* ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out          *
***********************************************************************