Amanda-Users

Re: amdump freezes

2008-05-27 11:24:08
Subject: Re: amdump freezes
From: John E Hein <jhein AT timing DOT com>
To: Paul Bijnens <Paul.Bijnens AT xplanation DOT com>
Date: Tue, 27 May 2008 09:15:22 -0600
Paul Bijnens wrote at 12:33 +0200 on May 27, 2008:
 > On 2008-05-25 18:55, jehan procaccia wrote:
 > > hello,
 > > 
 > > some clients with "big" partitions (>100Gbytes) freezes my amdump,  I 
 > > usually get dumps errors which cannot end properly.
 > > I have 2 questions,
 > > 1) how can I resolve that "client" error, timeout or whatever ?
 > 
 > This look suspiciously like the problem (and solution) described here:
 > 
 > http://wiki.zmanda.com/index.php/Mesg_read:_Connection_reset_by_peer

Note that the explanation for tcp_keepalive_time (spelled
'net.inet.tcp.keepidle' on FreeBSD - see
http://www.freebsd.org/cgi/man.cgi?query=tcp&apropos=0&sektion=0&manpath=FreeBSD+7.0-RELEASE&format=html)
is not quite accurate.  It is the amount of time before the tcp stack
decides a connection is idle which then sends N keepalives (N is
configurable).  The keepalives are then sent at some interval
(/proc/sys/net/ipv4/tcp_keepalive_intvl seconds on linux,
net.inet.tcp.keepintvl ms on FreeBSD).

The explanation on the wiki page seems to imply that the setting is
the interval between keepalives.  That's not quite a correct
interpretation of those particular settings (at least not the linux
one - not sure about the Solaris setting).

For the problem this wiki page is addressing, you probably want to
lower the idle timeout, and possibly modify the subsequent sending
interval and possibly the count.  This is what the suggested command
for linux actually does - it's just the explanatory text that is
unclear.

Of course, these settings will affect all tcp connections on the
machine by default.  It is not really intended for determination of
application level socket status, but rather status of the host (or
comms link).  If you are hitting these keepalive timeouts, you may
really want to figure out why (overly aggressive firewall for
instance).

It might be worthwhile for amanda to grow its own keepalive mechanism
(similar to ssh) to address this issue so one doesn't have to change
settings for all tcp connections.


<Prev in Thread] Current Thread [Next in Thread>