Amanda-Users

Re: amdump freezes

2008-05-27 06:37:15
Subject: Re: amdump freezes
From: Paul Bijnens <Paul.Bijnens AT xplanation DOT com>
To: jehan.procaccia AT it-sudparis DOT eu
Date: Tue, 27 May 2008 12:33:21 +0200
On 2008-05-25 18:55, jehan procaccia wrote:
hello,

some clients with "big" partitions (>100Gbytes) freezes my amdump, I usually get dumps errors which cannot end properly.
I have 2 questions,
1) how can I resolve that "client" error, timeout or whatever ?

This look suspiciously like the problem (and solution) described here:

http://wiki.zmanda.com/index.php/Mesg_read:_Connection_reset_by_peer



2) why the amanda server doesnt giveup after dtimeout (1800s) and finishes its amdump ?

I think the intermediate "% done" messages that dump normally generates
are too far apart on the large dumps to keep the idle TCP connection in the
firewall open.  Same problem can happen on the index TCP connection, when
dumping a few very big files.

Amanda does not give up, because the data itself is still
flowing through the data connection, not exceeding the "dtimeout" value.
However some firewall in between has closed the idle TCP connection carrying
the messages or index.




I use amanda-2.5.0p2-4 on a centos 5.1 server with disk (virtual tapes on a raid 5) backup media.

here's the client error:

sendbackup: time 2185.293: 87: normal(|): DUMP: 4.10% done at 2642 kB/s, finished in 13:38 sendbackup: time 2485.300: 87: normal(|): DUMP: 4.67% done at 2632 kB/s, finished in 13:37
....
sendbackup: time 33385.302: 87: normal(|): DUMP: 60.34% done at 2453 kB/s, finished in 6:04 sendbackup: time 33685.307: 87: normal(|): DUMP: 60.89% done at 2453 kB/s, finished in 5:59
sendbackup: time 33753.599: index tee cannot write [Broken pipe]
sendbackup: time 33753.599: pid 25328 finish time Sun May 25 07:37:50 2008
sendbackup: time 33753.600: 109:  normal(|):
sendbackup: time 33753.601: 112: strange(?): gzip: stdout: Broken pipe
sendbackup: time 33753.601: 112: strange(?): sendbackup: index tee cannot write [Broken pipe]
sendbackup: time 33753.610:  87:  normal(|):   DUMP: Broken pipe
sendbackup: time 33753.611: 87: normal(|): DUMP: The ENTIRE dump is aborted. sendbackup: time 33753.611: error [compress returned 1, /sbin/dump returned 3]
sendbackup: time 33753.611: pid 25325 finish time Sun May 25 07:37:50 2008


[root@backup /var/lib/amanda/int]
$ amstatus int --dumping
Using /var/lib/amanda/int/amdump.1 from sam mai 24 22:06:48 CEST 2008

helios:/home4 0 85174m dumping 37083m ( 43.54%) (22:22:03) helios:/home9 0 109374m dumping 30169m ( 27.58%) (22:15:15)


amanda.conf:

etimeout -1200
dtimeout 1800
tpchanger "chg-multi"

define dumptype bi-comp-user-size {
   comp-user
   comment "Non-root partitions on bi-proc machines"
   maxdumps 2
   estimate calcsize
}
Disklist exemple of big partition
helios /home4   bi-comp-user-size

Thanks .



--
Paul Bijnens, xplanation Technology Services        Tel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUM    Fax  +32 16 397.512
http://www.xplanation.com/          email:  Paul.Bijnens AT xplanation DOT com
***********************************************************************
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, ^^, *
* F6, quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* init 0, kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... *
* ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out          *
***********************************************************************

<Prev in Thread] Current Thread [Next in Thread>