Re: estimate timeouts at 6hrs?
2007-06-11 11:19:44
* Jean-Louis Martineau <martineau AT zmanda DOT com> [20070611 10:00]:
> amandad have a hard limit of 6h (see REP_TIMEOUT in amandad-src/amandad.c)
> in waiting for the reply from sendsize.
>
> Try the attached patch, it reset the timeout after each estimates.
Thanks Jean-Louis.
Would that explains why I see a lot of runaway processes after
sendsize times out? Over the weekend I had a situation where over +90
gnutar processes were left around with init as parent like the
following:
UID PID PPID C STIME TTY TIME CMD
root 23243074 1 0 16:22:41 ? 11:40 gtar --create --file -
--directory /data/mafalda/mafalda1/susanita/jen/anxiety_
The relevent debug file showed:
runtar.20070610162241.debug
runtar: debug 1 pid 23243074 ruid 666 euid 0: start at Sun Jun 10
16:22:41 2007
runtar: time 0.002: version 2.5.2-20070523
/usr/freeware/bin/tar version: tar (GNU tar) 1.13.25
config: stk_80-conf1
runtar: debug 1 pid 23243074 ruid 0 euid 0: rename at Sun Jun 10
16:22:41 2007
running: /usr/freeware/bin/tar: 'gtar' '--create' '--file' '-'
'--directory'
'/data/mafalda/mafalda1/susanita/jen/anxiety_version1/sub115'
'--one-file-system' '--listed-incremental'
'/opt/amanda/amanda1/var/amanda/gnutar-lists/yoricksub115_1.new'
'--sparse'
'--ignore-failed-read' '--totals' '.'
runtar: time 0.020: pid 23243074 finish time Sun Jun 10 16:22:41 2007
I've this with both xfsdump and gnutar.
thanks, jf
>
> Jean-Louis
>
> Jean-Francois Malouin wrote:
> >Hi,
> >
> >A new problem that has me stumped: all the amdumps from client to server
> >(same host runing 2.5.2-20070623) have failed due to estimate timing
> >out after 6:00h. This happened in all the multiple config that I run,
> >even though the etimeout in each of the amanda config is set to
> >ridiculous value: in one case etimeout=5600 and I have 77 DLEs which
> >should not timeout for ~120h! Anything else could cause this:
> >
> >FAILURE AND STRANGE DUMP SUMMARY:
> > yorick /data/bigml/bigml1 lev 0 FAILED [disk
> >/data/bigml/bigml1, all estimate timed out]
> >...
> > yorick /data/nih/nih1/ lev 0 FAILED [disk
> >/data/nih/nih1/, all estimate timed out]
> > planner: ERROR Request to yorick failed: EOF on read from yorick
> >
> >
> >STATISTICS:
> > Total Full Incr.
> > -------- -------- --------
> >Estimate Time (hrs:min) 6:00
> >Run Time (hrs:min) 15:07
> >Dump Time (hrs:min) 15:14 14:59 0:15
> >
> >
> >jf
> >
>
--
<° ><
|
|
|