I am experimenting with amanda and am having problems understanding how
the "estimate timeout" works. What I am seeing is not what I expect
based on the explaination in the amanda.conf page.
Client and server are on the same machine.
36 - DLEs (some set to calcsize, most use tar for estimates)
all DLEs are set to "always full" (to remove any confusion caused by
incremental backups)
etimeout = 900
My understanding is that amanda will allow 900 seconds per DLE. So it
should timeout in 32,400 seconds (900 x 36) or 9 hours. According to
the report after amdump runs, the estimate phase is over 11 hours.
Looking at the sendsize log, the last "estimate time for" the 28th DLE
is at time 30,708. And, the time stamp on the sendsize file matches:
approximately 8.5 hours after amanda started. (So far so good.)
However, the first runtar log doesn't appear until another 1.5 hours
later. Which means amanda didn't do anything for 1.5 hours. (The lag
is greater when allowing incremental backups.)
Looking at the planner log on the server side, there is a "dgram_recv"
with a matching for each "estimate time" entry on the client side for
the first 27 DLEs.
client sendsize log:
sendsize[8661]: time 9899.506: estimate time for home_q level 0: 12.033
sendsize[8668]: time 13962.655: estimate time for home_r level 0: 4063.141
sendsize[9335]: time 30708.356: estimate time for home_s level 0: 16745.578
server planner log:
time 9899.665: dgram_recv(dgram=0xb805c764, timeout=0, fromaddr=0xb806c750)
time 9899.665: (sockaddr_in6 *)0xb806c750 = { 10, 10080,
::ffff:192.168.0.247 }
time 13962.833: dgram_recv(dgram=0xb805c764, timeout=0, fromaddr=0xb806c750)
time 13962.833: (sockaddr_in6 *)0xb806c750 = { 10, 10080,
::ffff:192.168.0.247 }
time 21600.190: dgram_recv(dgram=0xb805c764, timeout=0, fromaddr=0xb806c750)
time 21600.211: (sockaddr_in6 *)0xb806c750 = { 10, 10080,
::ffff:192.168.0.247 }
time 40082.633: security_seterror(handle=0x80721f8, driver=0xb804a720
(BSD) error=timeout waiting for REP)
time 40082.665: security_close(handle=0x80721f8, driver=0xb804a720 (BSD))
time 40082.719: pid 6922 finish time Sun Oct 25 19:08:03 2009
Something is happening at 21600 (6 hours). The server recevies a dmesg
from somewhere (there is no corresponding entry in the sendsize log) and
stops looking for estimates from the client. Then waits another 5 hours
before it starts the actual backup.
Any idea what is happening at 21600?
--
Garnet Harris TEL: +301 515 7118
Eka Systems FAX: +301 515 4965
20201 Century Blvd., Suite 250
Germantown, MD 20874 garnet.harris AT ekasystems DOT com
|