Amanda-Users

Re: estimate timeouts at 6hrs?

2007-06-11 10:05:51
Subject: Re: estimate timeouts at 6hrs?
From: Jean-Louis Martineau <martineau AT zmanda DOT com>
To: Jean-Francois Malouin <Jean-Francois.Malouin AT bic.mni.mcgill DOT ca>
Date: Mon, 11 Jun 2007 09:59:52 -0400
amandad have a hard limit of 6h (see REP_TIMEOUT in amandad-src/amandad.c)
in waiting for the reply from sendsize.

Try the attached patch, it reset the timeout after each estimates.

Jean-Louis

Jean-Francois Malouin wrote:
Hi,

A new problem that has me stumped: all the amdumps from client to server
(same host runing 2.5.2-20070623) have failed due to estimate timing
out after 6:00h. This happened in all the multiple config that I run,
even though the etimeout in each of the amanda config is set to
ridiculous value: in one case etimeout=5600 and I have 77 DLEs which
should not timeout for ~120h! Anything else could cause this:

FAILURE AND STRANGE DUMP SUMMARY:
  yorick  /data/bigml/bigml1                  lev 0  FAILED [disk
/data/bigml/bigml1, all estimate timed out]
...
  yorick  /data/nih/nih1/                     lev 0  FAILED [disk
/data/nih/nih1/, all estimate timed out]
 planner: ERROR Request to yorick failed: EOF on read from yorick


STATISTICS:
                          Total       Full      Incr.
                        --------   --------   --------
Estimate Time (hrs:min)    6:00
Run Time (hrs:min)        15:07
Dump Time (hrs:min)       15:14      14:59       0:15


jf

diff -u -r --show-c-function --new-file 
--exclude-from=/home/martinea/src.orig/amanda.diff 
--ignore-matching-lines='$Id:' amanda-2.5.2p1/amandad-src/amandad.c 
amanda-2.5.2p1.amandad/amandad-src/amandad.c
--- amanda-2.5.2p1/amandad-src/amandad.c        2007-05-04 07:39:06.000000000 
-0400
+++ amanda-2.5.2p1.amandad/amandad-src/amandad.c        2007-06-11 
09:56:32.000000000 -0400
@@ -901,8 +901,13 @@ s_repwait(
            do_sendpkt(as->security_handle, &as->rep_pkt);
            amfree(as->rep_pkt.body);
            pkt_init_empty(&as->rep_pkt, P_REP);
-       }
  
+           assert(as->ev_reptimeout != NULL);
+           event_release(as->ev_reptimeout);
+           as->ev_reptimeout = event_register(REP_TIMEOUT, EV_TIME,
+               timeout_repfd, as);
+       }
+
        return (A_PENDING);
     }
 
<Prev in Thread] Current Thread [Next in Thread>