Amanda-Users

Re: amanda got a tummy ache over sudden increase in data in /opt

2007-06-26 04:04:41
Subject: Re: amanda got a tummy ache over sudden increase in data in /opt
From: Paul Bijnens <Paul.Bijnens AT xplanation DOT com>
To: Gene Heskett <gene.heskett AT verizon DOT net>
Date: Tue, 26 Jun 2007 10:02:54 +0200
On 2007-06-26 09:17, Gene Heskett wrote:
> Greetings folks;
> 
> from /tmp/amanda-dbg/amandad/tonights file:
> -------------------------
> amandad: time 0.220: try_socksize: receive buffer size is 65536
> amandad: time 0.220: stream_server: waiting for connection: ::.41697
> amandad: time 0.220: sending REP pkt:
> <<<<<
> CONNECT DATA 44889 MESG 54361 INDEX 41697
> OPTIONS features=ffffffff9ffeffffffff00;
> amandad: time 0.220: dgram_send_addr(addr=0x8052568, dgram=0xb7f721a4)
> amandad: time 0.220: (sockaddr_in *)0x8052568 = { 2, 713, 192.168.71.3 }
> amandad: time 0.220: dgram_send_addr: 0xb7f721a4->socket = 0
> amandad: time 0.233: dgram_recv(dgram=0xb7f721a4, timeout=0, 
> fromaddr=0xb7f82190)
> amandad: time 0.233: (sockaddr_in *)0xb7f82190 = { 2, 713, 192.168.71.3 }
> amandad: time 0.233: received ACK pkt:
> <<<<<
> amandad: time 0.233: stream_accept: connection from ::ffff:192.168.71.3.52616
> amandad: time 0.234: try_socksize: send buffer size is 65536
> amandad: time 0.234: try_socksize: receive buffer size is 65536
> amandad: time 0.234: stream_accept: connection from ::ffff:192.168.71.3.46641
> amandad: time 0.234: try_socksize: send buffer size is 65536
> amandad: time 0.234: try_socksize: receive buffer size is 65536
> amandad: time 0.234: stream_accept: connection from ::ffff:192.168.71.3.56240
> amandad: time 0.234: try_socksize: send buffer size is 65536
> amandad: time 0.234: try_socksize: receive buffer size is 65536
> amandad: time 0.234: security_close(handle=0x8052548, driver=0xb7f702a0 (BSD))
> amandad: time 3578.686: security_stream_seterr(0x8063160, Connection reset by 
> peer)
> amandad: time 3578.765: security_stream_seterr(0x8063160, write error on 
> stream 44889: Broken pipe)
> amandad: time 3578.765: sending NAK pkt:
> <<<<<
> ERROR write error on stream 44889: write error on stream 44889: Broken pipe
> amandad: time 3578.765: security_stream_close(0x8063160)
> amandad: time 3578.765: security_stream_close(0x806b198)
> amandad: time 3578.765: security_stream_close(0x80731d0)
> amandad: time 3599.765: pid 31022 finish time Tue Jun 26 02:05:33 2007
> --------------------------
> 
> Did I run out of dtimeout?
> 
> ---------from amanda.conf---------
> dtimeout 1500           # number of idle seconds before a dump is aborted.
> ---------

The problem here is that we do not see the any trace of traffic
of absence of traffic on the DATA channel in such a log file.  We can
only guess.
So, yes, it could be that about 2078 seconds after starting the backup
it was hanging (e.g. when stat()ing a dead samba mounted filesystem)
for about 1500 seconds, after which the server aborted the dump, closing
the tcp connections to the client.

On the client side we only see "Connection reset by peer", meaning that
the other side (server) has closed the connection.  On the server side,
we should find the corresponding logs (probably in amdump.X log) about
3578 seconds later than the start of dump of this DLE.  Maybe some more
info is there about the decision to close the connection?

> 
> The backup stated at 00:15
> 
> FC6 box, iptables isn't running and selinux is disabled completely.


-- 
Paul Bijnens, xplanation Technology Services        Tel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUM    Fax  +32 16 397.512
http://www.xplanation.com/          email:  Paul.Bijnens AT xplanation DOT com
***********************************************************************
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, ^^, *
* F6, quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* init 0, kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... *
* ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out          *
***********************************************************************


<Prev in Thread] Current Thread [Next in Thread>