Amanda-Users

Re: dumper issue - timeout problem?

2006-03-21 19:04:25
Subject: Re: dumper issue - timeout problem?
From: Paul Bijnens <paul.bijnens AT xplanation DOT com>
To: Edson Noboru Yamada <eyamada AT diveo.net DOT br>, Mailing List Amanda User <amanda-users AT amanda DOT org>
Date: Tue, 21 Mar 2006 23:57:57 +0100
Edson Noboru Yamada schreef:

I´ve been facing a problem when trying to backup one of our clients.
The backup starts normally, but after some time, the following message shows up
in the taper log:


dumper: stream_client: our side is 0.0.0.0.45740
driver: result time 553.824 from dumper0: FAILED 01-00002 [mesg read: 
Connection reset by peer]
dumper: kill index command
taper: reader-side: got label DMX224 filenum 1

Note: it is the "mesg" channel that was closed by the peer.
Probably because it was idle for too long.




On the client side, I can read something like this on the sendbackup log:

sendbackup-gnutar: time 0.248: /usr/local/libexec/runtar: pid 15147
sendbackup: time 0.309: started index creator: "/usr/bin/tar -tf - 2>/dev/null | sed 
-e 's/^\.//'"
sendbackup: time 301.700: index tee cannot write [Broken pipe]
sendbackup: time 301.700: pid 15145 finish time Tue Mar 21 15:39:18 2006
sendbackup: time 301.712: 124: strange(?): sendbackup: index tee cannot write 
[Broken pipe]

The index was closed by the server, after the mesg channel broke down.
Because the client does not need to send through the mesg channel yet, it did not notice that. But it tries to write to the index channel, which was closed by the server already.




I've already tried to turn off index and the holding disk, but no success.

One important thing I´ve noticed is that the error allways occurs after 300 
seconds.
Is there some tunable timeout I´m forgetting?

Additional info: strangely, the backup appears successful, even when this 
message shows up.
The same client is able to backup other file systems, and the one that fails 
the most
is the / filesystem.

Any ideas?

Is it the problem described here:

http://wiki.zmanda.com/index.php/Amdump_fails_to_backup_large_DLEs

     Increase tcp keepalive probes:

  echo 90 > /proc/sys/net/ipv4/tcp_keepalive_time


--
Paul Bijnens, Xplanation                            Tel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUM    Fax  +32 16 397.512
http://www.xplanation.com/          email:  Paul.Bijnens AT xplanation DOT com
***********************************************************************
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit,  ZZ, :q, :q!,  M-Z, ^X^C,  logoff, logout, close, bye,  /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* kill -9 1,  Alt-F4,  Ctrl-Alt-Del,  AltGr-NumLock,  Stop-A,  ...    *
* ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out          *
***********************************************************************

<Prev in Thread] Current Thread [Next in Thread>