Hi all,
I am running Amanda for a linux network backup.
The server is a Fedora Core 3 box with :
amanda-2.4.4p3-1
amanda-client-2.4.4p3-1
amanda-server-2.4.4p3-1
The client is a Centos 4.3 box with :
amanda-client-2.4.4p3-1
I occasionally get failures to backup this client (it is the big one
with a large list of DLEs). The errors are always time-outs during
estimates. Like this:
gilmore /backedup/home lev 0 FAILED [Estimate timeout from gilmore]
gilmore /backedup/project lev 0 FAILED [Estimate timeout from gilmore]
...
It seems like this happens about 1 out of every 5 runs... so far I've just
learned to skip a day and hope nothing bad happens that day - not very good.
My assumption was that I should upgrade to 2.5.x as then I can use the lighter,
less accurate estimate methods rather than the default estimate method.
However, this weekend I had a fail with a data timeout:
gilmore /nonbackedup/work3/backups/glen lev 0 FAILED [data timeout]
Now whenever I do a amcheck I get:
WARNING: gilmore: selfcheck reply timed out.
Meanwhile, on the client gilmore if I check all the amanda processes I get:
[root@gilmore amanda]# ps aux | grep amanda
amanda 11570 0.0 0.0 3104 884 ? D Apr20 0:00
/usr/lib/amanda/sendbackup
amanda 12910 0.0 0.0 3944 796 ? D Apr20 0:00
/usr/lib/amanda/selfcheck
amanda 8768 0.0 0.0 2400 796 ? D 11:48 0:00
/usr/lib/amanda/selfcheck
amanda 9104 0.0 0.0 2888 888 ? Ss 13:36 0:00 amandad
amanda 9105 0.0 0.0 3432 792 ? D 13:36 0:00
/usr/lib/amanda/selfcheck
amanda 9106 0.0 0.0 0 0 ? Z 13:36 0:00 [amandad]
<defunct>
Note the "selfchecks" that are running with "D" process state - meaning they
are sleeping in the kernel and are uninterruptible and therefore unkillable.
So - it looks like I need to reboot my client before I can get a backup from it
again, which is a little harsh.
I was wondering whether anyone knows why Amanda client 2.4.4 would get wedged
like that, is there something I can do to minimize the problem? Also, if
anyone has ideas about avoiding the estimate issues all together, I would
appreciate any advice.
Don
|