amandad have a hard limit of 6h (see REP_TIMEOUT in amandad-src/amandad.c)
in waiting for the reply from sendsize.
Try the attached patch, it reset the timeout after each estimates.
Jean-Louis
Jean-Francois Malouin wrote:
Hi,
A new problem that has me stumped: all the amdumps from client to server
(same host runing 2.5.2-20070623) have failed due to estimate timing
out after 6:00h. This happened in all the multiple config that I run,
even though the etimeout in each of the amanda config is set to
ridiculous value: in one case etimeout=5600 and I have 77 DLEs which
should not timeout for ~120h! Anything else could cause this:
FAILURE AND STRANGE DUMP SUMMARY:
yorick /data/bigml/bigml1 lev 0 FAILED [disk
/data/bigml/bigml1, all estimate timed out]
...
yorick /data/nih/nih1/ lev 0 FAILED [disk
/data/nih/nih1/, all estimate timed out]
planner: ERROR Request to yorick failed: EOF on read from yorick
STATISTICS:
Total Full Incr.
-------- -------- --------
Estimate Time (hrs:min) 6:00
Run Time (hrs:min) 15:07
Dump Time (hrs:min) 15:14 14:59 0:15
jf
diff -u -r --show-c-function --new-file
--exclude-from=/home/martinea/src.orig/amanda.diff
--ignore-matching-lines='$Id:' amanda-2.5.2p1/amandad-src/amandad.c
amanda-2.5.2p1.amandad/amandad-src/amandad.c
--- amanda-2.5.2p1/amandad-src/amandad.c 2007-05-04 07:39:06.000000000
-0400
+++ amanda-2.5.2p1.amandad/amandad-src/amandad.c 2007-06-11
09:56:32.000000000 -0400
@@ -901,8 +901,13 @@ s_repwait(
do_sendpkt(as->security_handle, &as->rep_pkt);
amfree(as->rep_pkt.body);
pkt_init_empty(&as->rep_pkt, P_REP);
- }
+ assert(as->ev_reptimeout != NULL);
+ event_release(as->ev_reptimeout);
+ as->ev_reptimeout = event_register(REP_TIMEOUT, EV_TIME,
+ timeout_repfd, as);
+ }
+
return (A_PENDING);
}
|