Amanda-Users

Re: gtar still running after amdump was done

2007-09-26 10:14:18
Subject: Re: gtar still running after amdump was done
From: Yu Chen <chen AT hhmi.umbc DOT edu>
To: amanda-users AT amanda DOT org
Date: Wed, 26 Sep 2007 10:13:21 -0400 (EDT)
I am running amanda 2.5.2p1 with "-o tpchanger= -o tapedev=" options, my holding disk is 100GB. The server/client is on the same computer. After amdump finished, I found there are still two "gtar" running. I checked amanda log file, it says
"...
FAIL driver [host] [disk1] [date] 0 [no more holding disk space]
FAIL driver [host] [disk2] [date] 0 [no more holding disk space]
"
at the end, and the two disks are corresponding to the two "gtar" processes.

Is this right? Is it should be automatically killed/aborted if this happens?

I've seen and reported this problem many times, in this exact instance
(holddisk filled up) and also when there is a data timeout either
during the estimate phase or with amdump. Thing is that it's not 100%
reproducible in my local setup so I suspect that some other
condition(s) must be met for this to happen.

The problem is known, but difficult to solve.
It means that the server should contact the client (which in the
current implementation is not expecting such actions) and the
client should find the related running processes, and kill them.
The situation is sufficiently rare, and the solution sufficiantly
complicated, that a fix is not yet implemented.  Anyway, a fix
on the server would not work any existing client either.
What the server does now, is close the TCP-connection.
And whenever the other side notices the closed connection, any program
depending on it should stop.  But it seems that some clients fail
to detect this (OS dependend?) or, at least, take a long time after
the fact to detect this.  When this happens, you usually get the
cryptic error message in the client debug files "connection reset
by peer". It's sometimes difficult to relate this to a server
problem some time before.

Thank for the detailed explanation. So this seems radom. Yesterday, my backup was done fine. The log still gave the error for the exact same two diskes:
"...
FAIL driver [host] [disk1] [date] 0 [no more holding disk space]
FAIL driver [host] [disk2] [date] 0 [no more holding disk space]
" Yet, no 'gtar' was left running.

CY



--
Paul Bijnens, xplanation Technology Services        Tel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUM    Fax  +32 16 397.512
http://www.xplanation.com/          email:  Paul.Bijnens AT xplanation DOT com
***********************************************************************
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, ^^, *
* F6, quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* init 0, kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... *
* ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out          *
***********************************************************************




===========================================
Yu Chen
Howard Hughes Medical Institute
Chemistry Building, Rm 182
University of Maryland at Baltimore County
1000 Hilltop Circle
Baltimore, MD 21250

phone:  (410)455-1728 (primary)
        (410)455-6347 (secondary)
fax:    (410)455-1174
email:  chen AT hhmi.umbc DOT edu
===========================================

<Prev in Thread] Current Thread [Next in Thread>