Amanda-Users

Client just started timing out.

2006-02-01 11:46:38
Subject: Client just started timing out.
From: Stephen Carville <carville AT cpl DOT net>
To: Amanda Users <amanda-users AT amanda DOT org>
Date: Wed, 01 Feb 2006 08:15:09 -0800
For about a week one of my amanda clients has been timing out during planning. If I go into /tmp/amanda on thames (the client) I see a file like:

$ cat amandad.20060131210039.debug
amandad: debug 1 pid 18530 ruid 250 euid 250: start at Tue Jan 31 21:00:39 2006
amandad: version 2.4.5p1
amandad: build: VERSION="Amanda-2.4.5p1"
amandad:        BUILT_DATE="Mon Jan 30 08:08:00 PST 2006"
amandad: BUILT_MACH="Linux thames 2.4.7-10 #1 Thu Sep 6 17:27:27 EDT 2001 i686 unknown"
amandad:        CC="gcc"
amandad: CONFIGURE_COMMAND="'./configure' '--with-user=amanda' '--with-group=adm'"
amandad: paths: bindir="/usr/local/bin" sbindir="/usr/local/sbin"
amandad:        libexecdir="/usr/local/libexec" mandir="/usr/local/man"
amandad:        AMANDA_TMPDIR="/tmp/amanda" AMANDA_DBGDIR="/tmp/amanda"
amandad:        CONFIG_DIR="/usr/local/etc/amanda" DEV_PREFIX="/dev/"
amandad:        RDEV_PREFIX="/dev/" DUMP="/sbin/dump"
amandad:        RESTORE="/sbin/restore" VDUMP=UNDEF VRESTORE=UNDEF
amandad:        XFSDUMP=UNDEF XFSRESTORE=UNDEF VXDUMP=UNDEF VXRESTORE=UNDEF
amandad:        SAMBA_CLIENT="/usr/bin/smbclient" GNUTAR="/bin/gtar"
amandad:        COMPRESS_PATH="/bin/gzip" UNCOMPRESS_PATH="/bin/gzip"
amandad:        LPRCMD="/usr/bin/lpr" MAILER="/usr/bin/Mail"
amandad:        listed_incr_dir="/usr/local/var/amanda/gnutar-lists"
amandad: defs:  DEFAULT_SERVER="thames" DEFAULT_CONFIG="DailySet1"
amandad:        DEFAULT_TAPE_SERVER="thames"
amandad:        DEFAULT_TAPE_DEVICE="/dev/null" HAVE_MMAP HAVE_SYSVSHM
amandad:        LOCKING=POSIX_FCNTL SETPGRP_VOID DEBUG_CODE
amandad:        AMANDA_DEBUG_DAYS=4 BSD_SECURITY USE_AMANDAHOSTS
amandad:        CLIENT_LOGIN="amanda" FORCE_USERID HAVE_GZIP
amandad:        COMPRESS_SUFFIX=".gz" COMPRESS_FAST_OPT="--fast"
amandad:        COMPRESS_BEST_OPT="--best" UNCOMPRESS_OPT="-dc"
amandad: time 0.000: got packet:
--------
Amanda 2.4 REQ HANDLE 005-505AE508 SEQ 1138770050
SECURITY USER amanda
SERVICE noop
OPTIONS features=fffffeff9ffe7f;
--------

amandad: time 0.000: sending ack:
----
Amanda 2.4 ACK HANDLE 005-505AE508 SEQ 1138770050
----

amandad: time 0.002: bsd security: remote host amazon.totalflood.com user amanda local user amanda
amandad: time 0.013: amandahosts security check passed
amandad: time 0.013: running service "noop"
amandad: time 0.013: sending REP packet:
----
Amanda 2.4 REP HANDLE 005-505AE508 SEQ 1138770050
OPTIONS features=fffffeff9ffe7f;
----

amandad: time 0.013: got packet:
----
Amanda 2.4 ACK HANDLE 005-505AE508 SEQ 1138770050
----

amandad: time 0.013: pid 18530 finish time Tue Jan 31 21:00:39 2006

--

By comparison to other machines that do work, nothing in the above looks amiss.

Moving over to the server (amazon), I see in the log

DISK planner thames /
DISK planner thames /boot
DISK planner thames /export/common
DISK planner thames /export/edi
DISK planner thames /export/netapps
DISK planner thames /export/private
DISK planner thames /export/public
DISK planner thames /var

And then a few lines later

FAIL planner thames /var 20060131 0 [Request to thames timed out.]
FAIL planner thames /export/public 20060131 0 [Request to thames timed out.]
FAIL planner thames /export/private 20060131 0 [Request to thames timed out.] FAIL planner thames /export/netapps 20060131 0 [Request to thames timed out.]
FAIL planner thames /export/edi 20060131 0 [Request to thames timed out.]
FAIL planner thames /export/common 20060131 0 [Request to thames timed out.]
FAIL planner thames /boot 20060131 0 [Request to thames timed out.]
FAIL planner thames / 20060131 0 [Request to thames timed out.]

However, there is no indication on thames that the planner packet(s) were ever received. Shouldn't I at least see a sendsize.nnnnnn.debug file?

Amazon is running amanda 2.4.5. Thames was running 2.4.3b2 when this all started but is now running 2.4.5p1.

Fortunately in this case, none of the above is big problem. Thames is scheduled to be replaced soon and all of the "important" data is being mirrored to its replacement. Nevertheless, it is a bit frustrating to a have my backups mysteriously quit working on me.

Suggestions are welcome.

--
Stephen Carville -- polluting the ranks of skeptics since 1995.
---------------------------------------------------------------
Government is actually the worst failure of civilized man. There has never been a really good one, and even those that are most tolerable are arbitrary, cruel, grasping and unintelligent.
             -- H. L. Mencken

<Prev in Thread] Current Thread [Next in Thread>