On 2007-09-25 17:02, Jean-Francois Malouin wrote:
* Yu Chen <chen AT hhmi.umbc DOT edu> [20070925 10:13]:
Hi,
I am running amanda 2.5.2p1 with "-o tpchanger= -o tapedev=" options, my
holding disk is 100GB. The server/client is on the same computer. After
amdump finished, I found there are still two "gtar" running. I checked
amanda log file, it says
"...
FAIL driver [host] [disk1] [date] 0 [no more holding disk space]
FAIL driver [host] [disk2] [date] 0 [no more holding disk space]
"
at the end, and the two disks are corresponding to the two "gtar"
processes.
Is this right? Is it should be automatically killed/aborted if this
happens?
I've seen and reported this problem many times, in this exact instance
(holddisk filled up) and also when there is a data timeout either
during the estimate phase or with amdump. Thing is that it's not 100%
reproducible in my local setup so I suspect that some other
condition(s) must be met for this to happen.
The problem is known, but difficult to solve.
It means that the server should contact the client (which in the
current implementation is not expecting such actions) and the
client should find the related running processes, and kill them.
The situation is sufficiently rare, and the solution sufficiantly
complicated, that a fix is not yet implemented. Anyway, a fix
on the server would not work any existing client either.
What the server does now, is close the TCP-connection.
And whenever the other side notices the closed connection, any program
depending on it should stop. But it seems that some clients fail
to detect this (OS dependend?) or, at least, take a long time after
the fact to detect this. When this happens, you usually get the
cryptic error message in the client debug files "connection reset
by peer". It's sometimes difficult to relate this to a server
problem some time before.
--
Paul Bijnens, xplanation Technology Services Tel +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUM Fax +32 16 397.512
http://www.xplanation.com/ email: Paul.Bijnens AT xplanation DOT com
***********************************************************************
* I think I've got the hang of it now: exit, ^D, ^C, ^\, ^Z, ^Q, ^^, *
* F6, quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt, abort, hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e, kill -1 $$, shutdown, *
* init 0, kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... *
* ... "Are you sure?" ... YES ... Phew ... I'm out *
***********************************************************************
|