Amanda-Users

Re: sendsize finishes, planner doesn't notice...

2007-10-11 16:12:28
Subject: Re: sendsize finishes, planner doesn't notice...
From: Deb Baddorf <baddorf AT fnal DOT gov>
To: Paul Lussier <pll+amanda AT permabit DOT com>, amanda-users AT amanda DOT org
Date: Thu, 11 Oct 2007 15:03:00 -0500
The seems a bit similar to firewall issues we had a while back ---
the sendsize estimate took long enough that the connection FROM the
server was closed.  The firewall only allowed connections made by the
server, or replies back through the same connection    ... and needed to be
opened for the client to start a new connection back TO the server,  when
the estimate took over a certain amount of time.
   My understanding of it may be poor, but perhaps this will jog somebody's
mind....
Deb



At 3:39 PM -0400 10/11/07, Paul Lussier wrote:
Jean-Louis Martineau <martineau AT zmanda DOT com> writes:

 It's weird.

 Do you have an amdump log file or just amdump.1?
 The only way to get this is if you killed amanda process on the
 server, maybe a server crash.
 Do you still have amanda process running on the server?

I do now. I started amanda off Tuesday night at "Tue Oct  9 22:48:34 2007".

According the /var/log/amanda/amandad/amandad.20071009224834.debug file:

  amandad: time 21604.147: pid 26218 finish time Wed Oct 10 04:48:39 2007

According to sendsize.20071009224835.debug:

amanda2:/var/log/amanda/client/offsite# tail sendsize.20071009224835.debug
errmsg is /usr/local/libexec/runtar exited with status 1: see /var/log/amanda/client/offsite/sendsize.20071009224835.debug sendsize[26687]: time 37138.237: done with amname /permabit/user/uz dirname /permabit/user spindle -1 sendsize[26379]: time 37823.330: Total bytes written: 541649408000 (505GiB, 14MiB/s)
sendsize[26379]: time 37823.453: .....
sendsize[26379]: time 37823.453: estimate time for /permabit/user/eh level 0: 37823.251 sendsize[26379]: time 37823.453: estimate size for /permabit/user/eh level 0: 528954500 KB
sendsize[26379]: time 37823.453: waiting for runtar "/permabit/user/eh" child
sendsize[26379]: time 37823.453: after runtar /permabit/user/eh wait
errmsg is /usr/local/libexec/runtar exited with status 1: see /var/log/amanda/client/offsite/sendsize.20071009224835.debug sendsize[26379]: time 37823.537: done with amname /permabit/user/eh dirname /permabit/user spindle -1

So, sendsize claims to be done, yet planner doesn't think so:

  planner: time 16531.383: got partial result for host amanda2 disk \
     /permabit/user/uz: 0 -> -2K, -1 -> -2K, -1 -> -2K
  [...]
  planner: time 16531.384: got partial result for host amanda2 disk \
     /permabit/user/eh: 0 -> -2K, -1 -> -2K, -1 -> -2K

amdump is currently still running, amandad has finished, but we're
still waiting for estimates which will never arrive.

I also find it disturbing that the debug log I'm looking at,
sendsize.20071009224835.debug, tells me to look at the log I'm looking
at for further information:

    errmsg is /usr/local/libexec/runtar exited with status 1: see \
    /var/log/amanda/client/offsite/sendsize.20071009224835.debug

Any idea why amandad is dying before sending the estimate data back to
the planner?  My etimeout is currently set to:

  # grep timeout /etc/amanda/offsite/amanda.conf
  etimeout  72000          # number of seconds per filesystem for estimates.
  dtimeout  72000         # number of idle seconds before a dump is aborted.
  ctimeout    30          # maximum number of seconds that amcheck waits
amanda2:/var/log/amanda/server/offsite# su - backup -c 'amadmin offsite config' | grep -i timeout
  ETIMEOUT              72000
  DTIMEOUT              72000
  CTIMEOUT              30

amanda2:/var/log/amanda/server/offsite# /usr/local/sbin/amgetconf offsite etimeout
72000

su - backup -c 'amadmin offsite version'
build: VERSION="Amanda-2.5.2p1"
       BUILT_DATE="Tue Sep 4 15:45:27 EDT 2007"
BUILT_MACH="Linux amanda2 2.6.18-4-686 #1 SMP Mon Mar 26 17:17:36 UTC 2007 i686 GNU/Linux"
       CC="gcc-4.2"
CONFIGURE_COMMAND="'./configure' '--prefix=/usr/local' '--enable-shared' '--sysconfdir=/etc' '--localstatedir=/var/lib' '--with-gnutar-listdir=/var/lib/amanda/gnutar-lists' '--with-index-server=localhost' '--with-user=backup' '--with-group=backup' '--with-bsd-security' '--with-amandahosts' '--with-smbclient=/usr/bin/smbclient' '--with-debugging=/var/log/amanda' '--with-dumperdir=/usr/lib/amanda/dumper.d' '--with-tcpportrange=50000,50100' '--with-udpportrange=840,860' '--with-maxtapeblocksize=256' '--with-ssh-security'"
paths: bindir="/usr/local/bin" sbindir="/usr/local/sbin"
       libexecdir="/usr/local/libexec" mandir="/usr/local/man"
       AMANDA_TMPDIR="/tmp/amanda"
       AMANDA_DBGDIR="/var/log/amanda" CONFIG_DIR="/etc/amanda"
       DEV_PREFIX="/dev/" RDEV_PREFIX="/dev/" DUMP=UNDEF
       RESTORE=UNDEF VDUMP=UNDEF VRESTORE=UNDEF XFSDUMP=UNDEF
       XFSRESTORE=UNDEF VXDUMP=UNDEF VXRESTORE=UNDEF
       SAMBA_CLIENT=UNDEF GNUTAR="/bin/tar"
       COMPRESS_PATH="/bin/gzip" UNCOMPRESS_PATH="/bin/gzip"
       LPRCMD="/usr/bin/lpr" MAILER="/usr/bin/Mail"
       listed_incr_dir="/var/lib/amanda/gnutar-lists"
defs:  DEFAULT_SERVER="localhost" DEFAULT_CONFIG="DailySet1"
       DEFAULT_TAPE_SERVER="localhost" HAVE_MMAP NEED_STRSTR
       HAVE_SYSVSHM LOCKING=POSIX_FCNTL SETPGRP_VOID DEBUG_CODE
       AMANDA_DEBUG_DAYS=4 BSD_SECURITY RSH_SECURITY USE_AMANDAHOSTS
       CLIENT_LOGIN="backup" FORCE_USERID HAVE_GZIP
       COMPRESS_SUFFIX=".gz" COMPRESS_FAST_OPT="--fast"
       COMPRESS_BEST_OPT="--best" UNCOMPRESS_OPT="-dc"


Am I missing something extremely obvious?  I've been using amanda for
over a decade, and I can't figure out why she's behaving like this.

If there's any more information you need in order to help me figure
this out, please let me know, the suspense here is killing me :)

--
Thanks,
Paul