Amanda-Users

Re: estimate timeouts

2002-12-03 14:38:06
Subject: Re: estimate timeouts
From: Jay Lessert <jayl AT accelerant DOT net>
To: Matthew Boeckman <matthewb AT saepio DOT com>
Date: Tue, 3 Dec 2002 10:59:51 -0800
On Tue, Dec 03, 2002 at 10:57:19AM -0600, Matthew Boeckman wrote:
> After resolving my earlier problems, I've uncovered a new one. Amanda is 
> timeing out, apparently while waiting for estimates on one of my hosts. 
> I think I can fix this by bumping the timeout value,

Looks like this would be the thing to do.

[clip]
> nocomp tar on 4 directories (bkup1, 2, 3, 4) on the 100+GB filesystem
[clip]
> the behavior: If I remove all but 1 of the bkup partitions from 
> disklist, amanda runs fine, whenever I try to run with bkup1-4 in the 
> disklist, I get:
>   ultra      /webhome/bkup4 lev 0 FAILED [Request to ultra timed out.]
> for all partitions.
[clip]

> perplexed, as it kind of appears from these two that amanda was trying 
> to do both lvl 0's and lvl 1's of some of the partitions!

Rather, she's estimating what would happen if she did a level 0 or level 1,
so she can make a rational decision.

Total allowed estimate time for a client is (etimeout * # of DLEs).  If
all estimates for that client are not finished by then, the client is
skipped completely.  It is common for estimates to take a loooong
time on DLEs with a large number of small files; crank up etimeout
by 2X/4X and see what happens.

You may hit dtimeout next, you know what to do.  :-)

Make sure your GNU tar is appropriately recent (1.13.25), I've observed
older versions running pathologically long --listed-incremental times
on Solaris.

> sendsize: debug 1 pid 1796 ruid 602 euid 602 start time Tue Dec  3 01:00:07 
> 2002
[clip]
> sendsize: pid 1796 finish time Tue Dec  3 02:19:31 2002

So the estimate does finish, in just over an hour.  Default etimeout is
5 minutes, and you're doing 6 DLEs on ultra, so amdump is only waiting
30 minutes, and you lose.  The one hour+ estimate time is survivable,
so make sure GNU tar is new, bump etimeout to 800 or 1000 or 1200
seconds, and go for it.

-- 
Jay Lessert                               jay_lessert AT accelerant DOT net
Accelerant Networks Inc.                       (voice)1.503.439.3461
Beaverton OR, USA                                (fax)1.503.466.9472

<Prev in Thread] Current Thread [Next in Thread>