Amanda-Users

Re: First try at backing up other clients

2006-02-06 15:29:18
Subject: Re: First try at backing up other clients
From: Paul Bijnens <paul.bijnens AT xplanation DOT com>
To: Kevin Till <kevin.till AT zmanda DOT com>
Date: Mon, 06 Feb 2006 21:24:55 +0100
Kevin Till schreef:
Glenn English wrote:
On Mon, 2006-02-06 at 10:19 -0800, Kevin Till wrote:

Gordon J. Mills III wrote:

Thanks Stefan, I do have iptables running on the client since it is my
firewall machine.


There is another problem with amanda and iptables that made me crazy for
quite a while. It doesn't sound like it's your problem, but just in
case, here's a note I wrote to myself:



If a DLE is large and the client is behind an iptables firewall, the
estimate can timeout.   This is because iptables has a timeout (30
minutes) to kill inactive TCP connections, and the estimate takes
longer than that.    The kernel sends keepalive packets on TCP, but
the default time (2 hours (7200 seconds)) is longer than the iptables
timeout so iptables decides the connection has been abandoned and
tears it down. To fix this by setting the kernel keepalive time to 15 minutes, login
as root on the client and:
'echo 900 >/proc/sys/net/ipv4/tcp_keepalive_time'
see http://documents.made-it.com/iptables-timeout.html



The client's keepalive timeout is reset to 2 hours every time it
reboots.


Thanks Glenn! I have added the notes to http://wiki.zmanda.com/index.php/Configuration_with_iptables#Additional_Notes



The solution is correct, but the explanation is wrong.
It is not the estimate that times out. The estimate works fine because it uses UDP instead of TCP (and with the partial replies used in recent versions, the connection does not time out).
But the "error" channel over TCP when doing the backup itself can time out.

Here is what I mailed to the original question:


Glenn English wrote:
On Sun, 2005-11-20 at 19:36 +0100, Paul Bijnens wrote:


Turns out the problem was the iptables packet filter on the amanda
client. iptables has a timeout for idle TCP connections that was
breaking the connection to the server before the initial estimate of the
backup size was done (because it took so long to go through the huge
DLE).

The solution is to decrease the time between keepalive packets:

'echo 90 > /proc/sys/net/ipv4/tcp_keepalive_time'


I don't think this will help, because the estimates are exchanged
using UDP traffic.


The setting did it, but my understanding of why is wrong.
As I said to Paul off list, I put the default value back and watched
last night's backup.

The three ~12GB estimates came in, and the timeouts happened during the
data transfers (Connection reset by peer). I don't understand this.

Now I do, see below.


iptables times out and breaks a TCP connection on time, even if 100% of
the bandwidth of that connection is being used?? I doubt it

I set the timeout to 90 and reran a backup by hand. The data transfers
are working.
In other words, increasing iptables' TCP timeout seems to be necessary
for amanda backups of huge DLEs, but I don't understand why.

...

It says in the amanda dox ( http://www.amanda.org/docs/portusage.html )


AMANDA also uses TCP connections for transmitting the backup image,
messages and (optionally) the index list from a client back to the
dumper process on the tape server. A process called sendbackup is
started by amandad on the client. It creates two (or three, if
indexing is enabled) TCP sockets and sends their port numbers back to
dumper in a UDP message. Then dumper creates and binds TCP sockets on
its side and connects to the waiting sendbackup.


This sounds a lot like FTP to me. Maybe it's the messages connection
that's timing out.

Aha, that makes more sense.

Yes indeed, the data is transferred with one TCP connection, and the
stderr output is transferred over another TCP connection (and if you
do indexing, the table of contents is yet another TCP connection.

And yes, if there are not many errors, there is no traffic, except the
at the end, summarizing the number of bytes transferred and speed.
That can time out yes indeed!

And indeed, the above settings helps in this case.




--
Paul Bijnens, Xplanation                            Tel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUM    Fax  +32 16 397.512
http://www.xplanation.com/          email:  Paul.Bijnens AT xplanation DOT com
***********************************************************************
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit,  ZZ, :q, :q!,  M-Z, ^X^C,  logoff, logout, close, bye,  /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* kill -9 1,  Alt-F4,  Ctrl-Alt-Del,  AltGr-NumLock,  Stop-A,  ...    *
* ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out          *
***********************************************************************