Amanda-Users

BUG (was: Re: Handitarded....odd (partial) estimate timeout errors.)

2006-01-05 11:06:05
Subject: BUG (was: Re: Handitarded....odd (partial) estimate timeout errors.)
From: Paul Bijnens <paul.bijnens AT xplanation DOT com>
To: Michael Loftis <mloftis AT wgops DOT com>
Date: Thu, 05 Jan 2006 16:49:53 +0100
Michael Loftis wrote:


Paul asked for the logs, it seems like there's an amanda bug. The units

Yes, indeed, there is a bug in Amanda!
You have 236 DLE's for that host, and from my reading of the code
the REQuest UDP packet is limited to 32K instead of 64K (see planner.c
lines 1377-1383)  (Need to update the documentation!)

It seems that that planner splits up the REQuest packet into separate
UDP-packets when exceeding MAX_DGRAM/2, i.e. 32K.
Your first request was 32580 bytes.  Adding the next string to that
request would have excceeded the 32768 limit.
The reason for division by 2 seems to reserver space for error replies
on each of those.

However, the amandad client only expects one and only one REQuest packet.
Any other REQuest packet coming from the same connection (5-tuple:
protocol, remotehost, remoteport, localhost, localport) and having
a type "REQ" is considered a duplicate.
It should actually test for the handle and sequence to be identical
too. It does not.

It's not fixed quickly either:  when receiving the first "REQ" packet,
the amandad client forks and execs the request program (sendsize in
this case) and reads from the results from a pipe.

By the time the second, non-identical request comes in (with different
handle, sequence -- which is currently not checked), sendsize is already
started and cannot be given additional DLE's to estimate.


As a temporary workaround, you could shorten the exclude-list string for that host by creating a symlink:

   ln -s /etc/amanda/exclude.gtar /.excl

and use that as exclude-list: this shortens each line by 20 byte, which
would shrink the package to fit again. (236 DLE's * 20  = 4720 bytes
less in a REQuest UDP for that host!)


Anyway....I'm getting a headache thinking about it :) all my other DLEs seem ok for that host, and the ones that it misses are not always exactly the same, but all seem to be non-calcsize estimated.

Just bad luck for those entries that happen to go in the end of the
queue. On the other hand, when really unlucky, you could have up to three estimates for each DLE, overflowing even the 4K we saved by shrinking the exclude string...


--
Paul Bijnens, Xplanation                            Tel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUM    Fax  +32 16 397.512
http://www.xplanation.com/          email:  Paul.Bijnens AT xplanation DOT com
***********************************************************************
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, ^^, *
* F6, quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* init 0, kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... *
* ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out          *
***********************************************************************



<Prev in Thread] Current Thread [Next in Thread>