Amanda-Users

Re: gtar still running after amdump was done

2007-09-26 05:03:23
Subject: Re: gtar still running after amdump was done
From: Paul Bijnens <Paul.Bijnens AT xplanation DOT com>
To: amanda-users AT amanda DOT org
Date: Wed, 26 Sep 2007 11:01:33 +0200
On 2007-09-25 17:02, Jean-Francois Malouin wrote:
* Yu Chen <chen AT hhmi.umbc DOT edu> [20070925 10:13]:
Hi,

I am running amanda 2.5.2p1 with "-o tpchanger= -o tapedev=" options, my holding disk is 100GB. The server/client is on the same computer. After amdump finished, I found there are still two "gtar" running. I checked amanda log file, it says
"...
FAIL driver [host] [disk1] [date] 0 [no more holding disk space]
FAIL driver [host] [disk2] [date] 0 [no more holding disk space]
"
at the end, and the two disks are corresponding to the two "gtar" processes.

Is this right? Is it should be automatically killed/aborted if this happens?

I've seen and reported this problem many times, in this exact instance
(holddisk filled up) and also when there is a data timeout either
during the estimate phase or with amdump. Thing is that it's not 100%
reproducible in my local setup so I suspect that some other
condition(s) must be met for this to happen.

The problem is known, but difficult to solve.
It means that the server should contact the client (which in the
current implementation is not expecting such actions) and the
client should find the related running processes, and kill them.
The situation is sufficiently rare, and the solution sufficiantly
complicated, that a fix is not yet implemented.  Anyway, a fix
on the server would not work any existing client either.
What the server does now, is close the TCP-connection.
And whenever the other side notices the closed connection, any program
depending on it should stop.  But it seems that some clients fail
to detect this (OS dependend?) or, at least, take a long time after
the fact to detect this.  When this happens, you usually get the
cryptic error message in the client debug files "connection reset
by peer". It's sometimes difficult to relate this to a server
problem some time before.


--
Paul Bijnens, xplanation Technology Services        Tel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUM    Fax  +32 16 397.512
http://www.xplanation.com/          email:  Paul.Bijnens AT xplanation DOT com
***********************************************************************
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, ^^, *
* F6, quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* init 0, kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... *
* ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out          *
***********************************************************************


<Prev in Thread] Current Thread [Next in Thread>