Amanda-Users

Estimate Timeouts

2009-10-30 13:40:28
Subject: Estimate Timeouts
From: Garnet Harris <garnet.harris AT ekasystems DOT com>
To: amanda-users AT amanda DOT org
Date: Fri, 30 Oct 2009 12:32:20 -0400
I am experimenting with amanda and am having problems understanding how the "estimate timeout" works. What I am seeing is not what I expect based on the explaination in the amanda.conf page.

Client and server are on the same machine.

36 - DLEs  (some set to calcsize, most use tar for estimates)

all DLEs are set to "always full" (to remove any confusion caused by incremental backups)

etimeout = 900

My understanding is that amanda will allow 900 seconds per DLE. So it should timeout in 32,400 seconds (900 x 36) or 9 hours. According to the report after amdump runs, the estimate phase is over 11 hours.

Looking at the sendsize log, the last "estimate time for" the 28th DLE is at time 30,708. And, the time stamp on the sendsize file matches: approximately 8.5 hours after amanda started. (So far so good.) However, the first runtar log doesn't appear until another 1.5 hours later. Which means amanda didn't do anything for 1.5 hours. (The lag is greater when allowing incremental backups.)

Looking at the planner log on the server side, there is a "dgram_recv" with a matching for each "estimate time" entry on the client side for the first 27 DLEs.

client sendsize log:

sendsize[8661]: time 9899.506: estimate time for home_q level 0: 12.033
sendsize[8668]: time 13962.655: estimate time for home_r level 0: 4063.141
sendsize[9335]: time 30708.356: estimate time for home_s level 0: 16745.578


server planner log:

time 9899.665: dgram_recv(dgram=0xb805c764, timeout=0, fromaddr=0xb806c750)
time 9899.665: (sockaddr_in6 *)0xb806c750 = { 10, 10080, ::ffff:192.168.0.247 }
time 13962.833: dgram_recv(dgram=0xb805c764, timeout=0, fromaddr=0xb806c750)
time 13962.833: (sockaddr_in6 *)0xb806c750 = { 10, 10080, ::ffff:192.168.0.247 }
time 21600.190: dgram_recv(dgram=0xb805c764, timeout=0, fromaddr=0xb806c750)
time 21600.211: (sockaddr_in6 *)0xb806c750 = { 10, 10080, ::ffff:192.168.0.247 } time 40082.633: security_seterror(handle=0x80721f8, driver=0xb804a720 (BSD) error=timeout waiting for REP)
time 40082.665: security_close(handle=0x80721f8, driver=0xb804a720 (BSD))
time 40082.719: pid 6922 finish time Sun Oct 25 19:08:03 2009


Something is happening at 21600 (6 hours). The server recevies a dmesg from somewhere (there is no corresponding entry in the sendsize log) and stops looking for estimates from the client. Then waits another 5 hours before it starts the actual backup.

Any idea what is happening at 21600?

--
Garnet Harris                           TEL: +301 515 7118
Eka Systems                             FAX: +301 515 4965
20201 Century Blvd., Suite 250
Germantown, MD   20874                  garnet.harris AT ekasystems DOT com

<Prev in Thread] Current Thread [Next in Thread>