Amanda-Users

dead processes

2007-04-23 18:03:44
Subject: dead processes
From: Don Murray <samba AT geeksrus DOT ca>
To: amanda-users AT amanda DOT org
Date: Mon, 23 Apr 2007 13:53:01 -0700

Hi all,

I am running Amanda for a linux network backup.
The server is a Fedora Core 3 box with :

amanda-2.4.4p3-1
amanda-client-2.4.4p3-1
amanda-server-2.4.4p3-1


The client is a Centos 4.3 box with :

amanda-client-2.4.4p3-1


I occasionally get failures to backup this client (it is the big one with a large list of DLEs). The errors are always time-outs during estimates. Like this:


 gilmore    /backedup/home lev 0 FAILED [Estimate timeout from gilmore]
 gilmore    /backedup/project lev 0 FAILED [Estimate timeout from gilmore]
...

It seems like this happens about 1 out of every 5 runs... so far I've just 
learned to skip a day and hope nothing bad happens that day - not very good.

My assumption was that I should upgrade to 2.5.x as then I can use the lighter, 
less accurate estimate methods rather than the default estimate method.

However, this weekend I had a fail with a data timeout:

 gilmore    /nonbackedup/work3/backups/glen lev 0 FAILED [data timeout]

Now whenever I do a amcheck I get:

WARNING: gilmore: selfcheck reply timed out.

Meanwhile, on the client gilmore if I check all the amanda processes I get:
[root@gilmore amanda]# ps aux | grep amanda
amanda   11570  0.0  0.0  3104  884 ?        D    Apr20   0:00 
/usr/lib/amanda/sendbackup
amanda   12910  0.0  0.0  3944  796 ?        D    Apr20   0:00 
/usr/lib/amanda/selfcheck
amanda    8768  0.0  0.0  2400  796 ?        D    11:48   0:00 
/usr/lib/amanda/selfcheck
amanda    9104  0.0  0.0  2888  888 ?        Ss   13:36   0:00 amandad
amanda    9105  0.0  0.0  3432  792 ?        D    13:36   0:00 
/usr/lib/amanda/selfcheck
amanda    9106  0.0  0.0     0    0 ?        Z    13:36   0:00 [amandad] 
<defunct>


Note the "selfchecks" that are running with "D" process state - meaning they 
are sleeping in the kernel and are uninterruptible and therefore unkillable.

So - it looks like I need to reboot my client before I can get a backup from it 
again, which is a little harsh.

I was wondering whether anyone knows why Amanda client 2.4.4 would get wedged 
like that, is there something I can do to minimize the problem?  Also, if 
anyone has ideas about avoiding the estimate issues all together, I would 
appreciate any advice.

Don




<Prev in Thread] Current Thread [Next in Thread>