Amanda-Users

Re: Estimate timeout error

2004-12-07 17:15:34
Subject: Re: Estimate timeout error
From: Paul Bijnens <paul.bijnens AT xplanation DOT com>
To: Nick Danger <nick AT hackermonkey DOT com>
Date: Tue, 07 Dec 2004 23:11:49 +0100
Nick Danger wrote:

Paul Bijnens wrote:

But the reply packet never got acknowledged by the server.
Somehow it got lost or corrupted.
Default route for reverse path not correct?  Wrong subnetmask?
Try do get a network trace at the client and server, and inbetween
(don't know how to accomplish that on a PIX firewall):

Solaris:
    snoop -x42 host x.x.x.x proto udp port 10080
using open source (linux and others):
    tcpdump -X host x.x.x.x and udp and port 10080
Or other programs that have the same capabilities (ethereal etc).

Before guessing how to fix it, we must know where the problem is.
Is the packet lost?  or is it broken?



Quick recap: Server grolsch tries to back up client dominion. It works for the partitions of /, /usr and /var. As soon as I tell grolsh to back up dominions /u00 partition (a 45G partition, but presently only 177M full w/approx 2000 files) it will fail. I have since removed /u00 from backups to at least keep things working in the meantime but I would like that data backed up :-)

I have moved the amanda server to public IP space. It is still behind a PIX firewall, I just got rid of the private IP to public IP mappings. This didnt fix it :-) Not that I thought it would, I just got annoyed at some of the routing.

I ran tcpdump on client and server, the dumps are on the following page, lined up as best I could to show the flow. It seems when doing the partition that makes it fail, a bunch of packets do not get from the client to the server. Since I am no expert in TCPdump or interpreting its results, I hope this helps figure out the problem.

tcpdump results on http://www.hackermonkey.com/amanda-error.html

very good to find that already.
You forgot "-s 1500", so that all packets are cut off at 256 bytes...

But I believe I have enough information to conclude that the PIX
firewall times out too soon for the udp reply.

Usually a dialog goes like:

     server sends some REQuest to client
     client answers with ACKnowledge to confirm receipt of request
     client sends REPly to the server
     server answers with ACKnowledge to confirm receipt of reply

The details of the REQ or REP packet are cut off by omitting the -s
option to tcdump, but you can see the strings REQ/ACK/REP in each packet.

The first exchange is a NOOP request, to which the client answers
with his list of capabilities.  This takes only a few milliseconds.

The second exchange is the request to estimate the list of
DLE's.  The client sends the REPly when all DLE's are estimated.
This takes more time:  09:51:45.208525 til 09:54:25.687229 or about
2 minutes 40 seconds.

However the packet is not recieved at the server.  The client just
sends the packet at an interval of 10 seconds, but never receives
the ACK, and gives up.

For a TCP connection a firewall has a notion of a connection and
keeps a TCP connection open until one of side stops the connections.
A UDP connection is stateless, and a firewall has no indication that
the third step (REPly) is related to the REQuest/ACK some minutes
before.  A firewall usually uses a timer to decide when to stop
transmitting packets.

It seems that the timer for UDP packets in the PIX firewall is
less than 2 minutes 40 seconds.  I have no experience with a PIX
firewall.  Any possibility to increase the UDP timeout?
Another possibility is to allow UDP packets to port 10080 from
client to server without timeouts.  (That's what stateless firewalls
have to do anyway.)



--
Paul Bijnens, Xplanation                            Tel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUM    Fax  +32 16 397.512
http://www.xplanation.com/          email:  Paul.Bijnens AT xplanation DOT com
***********************************************************************
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit,  ZZ, :q, :q!,  M-Z, ^X^C,  logoff, logout, close, bye,  /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* kill -9 1,  Alt-F4,  Ctrl-Alt-Del,  AltGr-NumLock,  Stop-A,  ...    *
* ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out          *
***********************************************************************