Amanda-Users

Re: amcheck hang

2005-01-06 12:36:49
Subject: Re: amcheck hang
From: Jon LaBadie <jon AT jgcomp DOT com>
To: amanda-users AT amanda DOT org
Date: Thu, 6 Jan 2005 12:26:55 -0500
On Thu, Jan 06, 2005 at 02:19:42PM -0000, Dan Tomlinson wrote:
> Hi all,
> 
> amanda has been failing during my nightly dumps: 
> 
> FAILURE AND STRANGE DUMP SUMMARY:
>    mymachine    /var/lib/mysql/ lev 0 FAILED [mymachine NAK: amandad
> busy]
> 
>  
> 
> Trawled newsgroups for answers and it appears that this particular error
> can be caused by amandad processes hanging around after a failed amanda
> operation. Did a quick "ps -ef | grep amanda" and saw there were
> "amandad" and "..amanda/selfcheck" processes active. Tried to kill them,
> but only the amandad would die :o( Eventually managed to kill the
> selfcheck with a kill -9 
> 
> I labeled up another new tape to attempt to manually continue the
> dumpcycle, and ran an "amcheck" after labelling, only to have it fail to
> finish. Another "ps -ef | grep amanda" found that there was now a new
> selfcheck process and a new amandad.
> 
> So the problem seems to be that the selfcheck process is hanging during
> amcheck execution and preventing the dump from finishing.  As to how to
> solve it? Various in the mailing lists suggest rebooting machine but a)
> this is inconvienient to say the least, and b) how do I know this wont
> just happen again next time amanda runs... 
> 

Feeling a little snarky here, how do I know the sun will come up tomorrow?


> I am using amanda version 2.4.2p2 on debian with a 2.4 kernel, any
> ideas?  Do we need to upgrade our amanda?

Basically Dan, you need to determine "why" selfcheck is not finishing.
That will take some research at your end.  Perhaps some of the debug
files in /tmp/amanda may help.

I'm having a similar, but totally unrelated problem at the moment,
my tape drive is acting up.  When taper tries to open the drive, the
open call never returns and taper just sits there.  Its not a bug in
amanda's taper program, but for some reason the OS and its tape device
driver return neither a success nor a failure to taper.  So it is making
a call to the system and until it returns, nothing more happens.  My
solution until I figure out what is wrong with the drive is to turn it
off when I'm not trying to analyze it.  Now when taper tries to open it
in the middle of the night, an error is returned (no such device) and
the backups collect in a fast shrinking holding disk space.

I'm not an advocate of keeping up with the latest and greatest.
If something is doing the job, no need to upgrade.  While your
version is pretty old, a survey done about 20 months ago showed
30-40% of servers and clients were using 2.4.2 versions.  I'd
expect substantially fewer today.  Certainly little (aka none)
effort is going into maintaining that code base.  There are many
good reasons for upgrading.  I doubt your current problem is one
of them.

jl
-- 
Jon H. LaBadie                  jon AT jgcomp DOT com
 JG Computing
 4455 Province Line Road        (609) 252-0159
 Princeton, NJ  08540-4322      (609) 683-7220 (fax)

<Prev in Thread] Current Thread [Next in Thread>