Amanda-Users

Re: planner timeouts

2005-09-09 21:05:32
Subject: Re: planner timeouts
From: Charles Sprickman <spork AT bway DOT net>
To: Paul Bijnens <paul.bijnens AT xplanation DOT com>
Date: Fri, 9 Sep 2005 20:48:23 -0400 (EDT)
On Thu, 8 Sep 2005, Paul Bijnens wrote:

http://bway.net/~spork/amanda-tcpdump.txt

You'll note that the "frags" thing is still set, but if you compare what the server sent to what the client received, it is identical.

I believe you have a UDP fragementation network problem somewhere
inbetween the two hosts, or maybe the devel2 host itself.

I believe you're on the right track here, and that this is not an amanda issue. Your input has been really helpful in giving me some direction.

Compare these two lines (first: view on devel2, second: view on h13):

00:07:40.444231 devel2.945 > h13.amanda: udp 1465 (ttl 64, id 45642, len 1493) 00:07:40.484232 devel2.945 > h13.amanda: udp 1465 (frag 45642:1472@0+) (ttl 51, len 1492)

The host devel2, sends a packet, claiming length 1493 bytes.
(1465 real payload + 28 bytes UDP packet header = 1493 bytes).

That's it right there. The server is back at an office on DSL. Supposedly without PPPoE. But after finally getting into the router I see that even though there's a routed static subnet and all, it is still doing PPPoE. MTU of 1492 on PPPoE. Sent packet of 1493, received 1492. I think I see where this is going.

The host h13, is apparently 13 hops away from the client (ttl 64-51).
That's pretty strange, when you say they are connected to the same
switch.  Even more strange is that the reply packets travel 15 hops
(ttl 64-49).  But I'm not the specialist here.

I wasn't clear, devel2, the amanda server is on a DSL line. The hosts that it backs up however are all on the same LAN in the same subnet on the same switch in a datacenter.

The server packet however now says the total length is 1492, but the
payload length is still 1465.  Strange because the 28 byte UDP header
seems be shortened by 1 byte? That could be when the UDP checksum, which
is optional, is removed from the header.  Some device inbetween removed
the checksum byte.  Must be a real firewall or layer 3 switch, changing
packets it transmits.

I'm eyeing the Netopia/Cayman DSL router very suspiciously at the moment. :)

As workaround, you could try to reduce the UDP packet by e.g.
leaving out some DLE's or omitting "compress-fast" from the options, or
anything that shortens the string and makes the complete datagram fit
in one packet.

That did work. I removed a few entries from disklist and got a good run. And that brings up another question... What else is sent in that request? Something seems to vary based on the success of the last run. After that one good partial run, I uncommented everything again in disklist for that host and had a successful full run. In other words, after a good partial run the next batch of sendsize requests were smaller by a few bytes. Why?

That would also explain why I now see a second and third host showing the same symptoms. Something in that request grows and pushes me past the 1492 byte limit.

Anyhow, I've got some really good information to work with now, thanks again for all your help and insight. Now I have to look at how PMTUD works and how this setup is breaking it.

Thanks,

Charles

ps - ^X^C sounds very familiar from a very long time ago, what is it?


--
Paul Bijnens, Xplanation                            Tel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUM    Fax  +32 16 397.512
http://www.xplanation.com/          email:  Paul.Bijnens AT xplanation DOT com
***********************************************************************
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, ^^, *
* F6, quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* init 0, kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... *
* ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out          *
***********************************************************************




<Prev in Thread] Current Thread [Next in Thread>