Amanda-Users

Re: request failed: timeout waiting for REP

2009-01-16 16:08:08
Subject: Re: request failed: timeout waiting for REP
From: Jim Summers <jsummers AT bachman.cs.ou DOT edu>
To: amanda-users AT amanda DOT org
Date: Fri, 16 Jan 2009 14:58:44 -0600


Jeremiah Millay wrote:
Greetings,
I'm having issues backing up a single partition on one of my servers. I always see this in my amanda emails:

FAILURE DUMP SUMMARY:
  echo /usr lev 0  FAILED [data timeout]
echo /usr lev 0 FAILED [too many dumper retry: "[request failed: timeout waiting for REP]"] echo /usr lev 0 FAILED [cannot read header: got 0 bytes instead of 32768]


FAILED DUMP DETAILS:

/--  echo /usr lev 0 FAILED [data timeout]me
sendbackup: start [echo:/usr level 0]
sendbackup: info BACKUP=/sbin/dump
sendbackup: info RECOVER_CMD=/sbin/restore -f - ...
sendbackup: info end
|   DUMP: WARNING: should use -L when dumping live read-write filesystems!
|   DUMP: Date of this level 0 dump: Sun Jan 11 18:52:58 2009
|   DUMP: Date of last level 0 dump: the epoch
|   DUMP: Dumping /dev/da0s1f (/usr) to standard output
|   DUMP: mapping (Pass I) [regular files]
|   DUMP: mapping (Pass II) [directories]
|   DUMP: estimated 9162049 tape blocks.
|   DUMP: dumping (Pass III) [directories]
|   DUMP: dumping (Pass IV) [regular files]
|   DUMP: 1.08% done, finished in 7:36 at Mon Jan 12 02:35:01 2009
|   DUMP: 3.03% done, finished in 5:20 at Mon Jan 12 00:23:31 2009
|   DUMP: 4.86% done, finished in 4:53 at Mon Jan 12 00:01:31 2009
|   DUMP: 6.67% done, finished in 4:39 at Sun Jan 11 23:52:51 2009
|   DUMP: 8.38% done, finished in 4:33 at Sun Jan 11 23:51:20 2009
|   DUMP: 8.93% done, finished in 5:06 at Mon Jan 12 00:29:05 2009
\--------


When I log into the machine after the amdump has finished for all of the rest of the partitions on this particular server there are hung dump and restore processes on the server. I always have to kill them the next day to prevent problems with the next backup.

As I mentioned I'm able to successfully backup all of my other partitions on this server so there is no issue with firewalls or anything like that. Looking through the list archives I've tried messing with etimeout and dtimeout on my amanda server but it hasn't resulted in anything useful:

etimeout 3600
dtimeout 3600



Here is some output from sendbackup on the client:

[root@echo /tmp/amanda/client/DailySet1]# less sendbackup.20090111185258.debug sendbackup: debug 1 pid 64539 ruid 2 euid 2: start at Sun Jan 11 18:52:58 2009
sendbackup: version 2.5.1p3
Could not open conf file "/usr/local/etc/amanda/amanda-client.conf": No such file or directory Could not open conf file "/usr/local/etc/amanda/DailySet1/amanda-client.conf": No such file or directory sendbackup: debug 1 pid 64539 ruid 2 euid 2: rename at Sun Jan 11 18:52:58 2009 sendbackup req: <DUMP /usr 0 1970:1:1:0:0:0 OPTIONS |;auth=BSD;srvcomp-best;index;>
 parsed request as: program `DUMP'
                    disk `/usr'
                    device `/usr'
                    level 0
                    since 1970:1:1:0:0:0
                    options `|;auth=BSD;srvcomp-best;index;'
sendbackup: start: echo:/usr lev 0
sendbackup: time 0.000: dumping device '/dev/da0s1f' with 'ufs'
sendbackup: time 0.000: spawning /sbin/dump in pipeline
sendbackup: argument list: dump 0ushf 1048576 0 - /dev/da0s1f
sendbackup: time 0.000: started backup
sendbackup: time 0.001: started index creator: "/sbin/restore -tvf - 2>&1 | sed -e '
s/^leaf[        ]*[0-9]*[       ]*\.//
t
/^dir[  ]/ {
s/^dir[         ]*[0-9]*[       ]*\.//
s%$%/%
t
}
d
'"
sendbackup: time 0.002: 87: normal(|): DUMP: WARNING: should use -L when dumping live read-write filesystems! sendbackup: time 0.002: 87: normal(|): DUMP: Date of this level 0 dump: Sun Jan 11 18:52:58 2009 sendbackup: time 0.002: 87: normal(|): DUMP: Date of last level 0 dump: the epoch sendbackup: time 0.003: 87: normal(|): DUMP: Dumping /dev/da0s1f (/usr) to standard output sendbackup: time 0.044: 87: normal(|): DUMP: mapping (Pass I) [regular files] sendbackup: time 5.234: 87: normal(|): DUMP: mapping (Pass II) [directories] sendbackup: time 5.234: 87: normal(|): DUMP: estimated 9162049 tape blocks. sendbackup: time 8.527: 87: normal(|): DUMP: dumping (Pass III) [directories] sendbackup: time 249.068: 87: normal(|): DUMP: dumping (Pass IV) [regular files] sendbackup: time 304.544: 87: normal(|): DUMP: 1.08% done, finished in 7:36 at Mon Jan 12 02:35:01 2009 sendbackup: time 604.420: 87: normal(|): DUMP: 3.03% done, finished in 5:20 at Mon Jan 12 00:23:31 2009 sendbackup: time 904.583: 87: normal(|): DUMP: 4.86% done, finished in 4:53 at Mon Jan 12 00:01:31 2009 sendbackup: time 1204.442: 87: normal(|): DUMP: 6.67% done, finished in 4:39 at Sun Jan 11 23:52:51 2009 sendbackup: time 1504.647: 87: normal(|): DUMP: 8.38% done, finished in 4:33 at Sun Jan 11 23:51:20 2009 sendbackup: time 1804.618: 87: normal(|): DUMP: 8.93% done, finished in 5:06 at Mon Jan 12 00:29:05 2009







Output from amandad on the client:


amandad: debug 1 pid 64538 ruid 2 euid 2: start at Sun Jan 11 18:52:58 2009
security_getdriver(name=BSD) returns 0x280ac020
amandad: version 2.5.1p3
amandad: build: VERSION="Amanda-2.5.1p3"
amandad:        BUILT_DATE="Mon Nov 10 15:46:33 EST 2008"
amandad: BUILT_MACH="FreeBSD echo.sover.net 7.0-RELEASE-p5 FreeBSD 7.0-RELEASE-p5 #0: Wed Oct 1 10:10:12 UTC 2008 root AT i386-builder.daemonology DOT net:/usr/obj/usr/src/sys/GENERIC i386"
amandad:        CC="cc" CONFIGURE_COMMAND="'./configure' 'freebsd7.0'"
amandad: paths: bindir="/usr/local/bin" sbindir="/usr/local/sbin"
amandad:        libexecdir="/usr/local/libexec/amanda"
amandad:        mandir="/usr/local/man" AMANDA_TMPDIR="/tmp/amanda"
amandad:        AMANDA_DBGDIR="/tmp/amanda"
amandad:        CONFIG_DIR="/usr/local/etc/amanda" DEV_PREFIX="/dev/"
amandad:        RDEV_PREFIX="/dev/" DUMP="/sbin/dump"
amandad:        RESTORE="/sbin/restore" VDUMP=UNDEF VRESTORE=UNDEF
amandad:        XFSDUMP=UNDEF XFSRESTORE=UNDEF VXDUMP=UNDEF VXRESTORE=UNDEF
amandad:        SAMBA_CLIENT=UNDEF GNUTAR="/usr/local/bin/gtar"
amandad:        COMPRESS_PATH="/usr/bin/gzip"
amandad:        UNCOMPRESS_PATH="/usr/bin/gzip" LPRCMD="/usr/bin/lpr"
amandad:        MAILER="/usr/bin/Mail"
amandad:        listed_incr_dir="/usr/local/var/amanda/gnutar-lists"
amandad: defs:  DEFAULT_SERVER="echo.sover.net"
amandad:        DEFAULT_CONFIG="DailySet1"
amandad:        DEFAULT_TAPE_SERVER="echo.sover.net" HAVE_MMAP
amandad:        HAVE_SYSVSHM LOCKING=POSIX_FCNTL DEBUG_CODE
amandad: AMANDA_DEBUG_DAYS=4 BSD_SECURITY RSH_SECURITY USE_AMANDAHOSTS
amandad:        CLIENT_LOGIN="operator" FORCE_USERID HAVE_GZIP
amandad:        COMPRESS_SUFFIX=".gz" COMPRESS_FAST_OPT="--fast"
amandad:        COMPRESS_BEST_OPT="--best" UNCOMPRESS_OPT="-dc"
amandad: time 0.000: dgram_recv(dgram=0x280acfe4, timeout=0, fromaddr=0x280bcfd0) amandad: time 0.000: (sockaddr_in *)0x280bcfd0 = { 2, 724, 209.198.112.169 }
security_handleinit(handle=0x28324080, driver=0x280ac020 (BSD))
amandad: time 0.001: accept recv REQ pkt:
<<<<<
SERVICE sendbackup
OPTIONS features=ffffffff9ffeffffffff00;hostname=echo;config=DailySet1;
DUMP /usr  0 1970:1:1:0:0:0 OPTIONS |;auth=BSD;srvcomp-best;index;
 >>>>>
amandad: time 0.001: creating new service: sendbackup
OPTIONS features=ffffffff9ffeffffffff00;hostname=echo;config=DailySet1;
DUMP /usr  0 1970:1:1:0:0:0 OPTIONS |;auth=BSD;srvcomp-best;index;

amandad: time 0.001: sending ACK pkt:
<<<<<
 >>>>>
amandad: dgram_send_addr(addr=0xbfbfe920, dgram=0x280acfe4)
amandad: time 0.001: (sockaddr_in *)0xbfbfe920 = { 2, 724, 209.198.112.169 }
amandad: dgram_send_addr: 0x280acfe4->socket = 0
security_streaminit(stream=0x28339000, driver=0x280ac020 (BSD))
amandad: try_socksize: send buffer size is 65536
amandad: try_socksize: receive buffer size is 65536
amandad: time 0.003: stream_server: waiting for connection: 0.0.0.0.59889
security_streaminit(stream=0x28342000, driver=0x280ac020 (BSD))
amandad: try_socksize: send buffer size is 65536
amandad: try_socksize: receive buffer size is 65536
amandad: time 0.004: stream_server: waiting for connection: 0.0.0.0.59967
security_streaminit(stream=0x2834b000, driver=0x280ac020 (BSD))
amandad: try_socksize: send buffer size is 65536
amandad: try_socksize: receive buffer size is 65536
amandad: time 0.004: stream_server: waiting for connection: 0.0.0.0.61824
amandad: time 0.004: sending REP pkt:
<<<<<
CONNECT DATA 59889 MESG 59967 INDEX 61824
OPTIONS features=fffffeff9ffeffffff7f;
 >>>>>
amandad: dgram_send_addr(addr=0xbfbfea00, dgram=0x280acfe4)
amandad: time 0.004: (sockaddr_in *)0xbfbfea00 = { 2, 724, 209.198.112.169 }
amandad: dgram_send_addr: 0x280acfe4->socket = 0
amandad: time 0.004: dgram_recv(dgram=0x280acfe4, timeout=0, fromaddr=0x280bcfd0) amandad: time 0.004: (sockaddr_in *)0x280bcfd0 = { 2, 724, 209.198.112.169 }
amandad: time 0.004: received ACK pkt:
<<<<<
 >>>>>
amandad: time 0.017: stream_accept: connection from 209.198.112.169.1024
amandad: try_socksize: send buffer size is 65536
amandad: try_socksize: receive buffer size is 65536
amandad: time 0.034: stream_accept: connection from 209.198.112.169.1024
amandad: try_socksize: send buffer size is 65536
amandad: try_socksize: receive buffer size is 65536
amandad: time 0.044: stream_accept: connection from 209.198.112.169.1024
amandad: try_socksize: send buffer size is 65536
amandad: try_socksize: receive buffer size is 65536
security_close(handle=0x28324080, driver=0x280ac020 (BSD))
amandad: time 5427.537: dgram_recv(dgram=0x280acfe4, timeout=0, fromaddr=0x280bcfd0) amandad: time 5427.537: (sockaddr_in *)0x280bcfd0 = { 2, 724, 209.198.112.169 }
security_handleinit(handle=0x28324080, driver=0x280ac020 (BSD))
amandad: time 5427.538: accept recv REQ pkt:
<<<<<
SERVICE sendbackup
OPTIONS features=ffffffff9ffeffffffff00;hostname=echo;config=DailySet1;
DUMP /usr  0 1970:1:1:0:0:0 OPTIONS |;auth=BSD;srvcomp-best;index;
 >>>>>
amandad: time 5427.538: sendbackup OPTIONS features=ffffffff9ffeffffffff00;hostname=echo;config=DailySet1;
DUMP /usr  0 1970:1:1:0:0:0 OPTIONS |;auth=BSD;srvcomp-best;index;
: already running, acking req
amandad: time 5427.538: sending ACK pkt:
<<<<<
 >>>>>
amandad: dgram_send_addr(addr=0xbfbfe990, dgram=0x280acfe4)
amandad: time 5427.538: (sockaddr_in *)0xbfbfe990 = { 2, 724, 209.198.112.169 }
amandad: dgram_send_addr: 0x280acfe4->socket = 0
security_close(handle=0x28324080, driver=0x280ac020 (BSD))
amandad: time 5487.555: dgram_recv(dgram=0x280acfe4, timeout=0, fromaddr=0x280bcfd0) amandad: time 5487.555: (sockaddr_in *)0x280bcfd0 = { 2, 724, 209.198.112.169 }
security_handleinit(handle=0x28324080, driver=0x280ac020 (BSD))
amandad: time 5487.557: accept recv REQ pkt:
<<<<<
SERVICE sendbackup
OPTIONS features=ffffffff9ffeffffffff00;hostname=echo;config=DailySet1;
DUMP /usr  0 1970:1:1:0:0:0 OPTIONS |;auth=BSD;srvcomp-best;index;
 >>>>>
amandad: time 5487.557: sendbackup OPTIONS features=ffffffff9ffeffffffff00;hostname=echo;config=DailySet1;
DUMP /usr  0 1970:1:1:0:0:0 OPTIONS |;auth=BSD;srvcomp-best;index;
: already running, acking req
amandad: time 5487.557: sending ACK pkt:
<<<<<
 >>>>>
amandad: dgram_send_addr(addr=0xbfbfe990, dgram=0x280acfe4)
amandad: time 5487.557: (sockaddr_in *)0xbfbfe990 = { 2, 724, 209.198.112.169 }
amandad: dgram_send_addr: 0x280acfe4->socket = 0
security_close(handle=0x28324080, driver=0x280ac020 (BSD))
amandad: time 5547.582: dgram_recv(dgram=0x280acfe4, timeout=0, fromaddr=0x280bcfd0) amandad: time 5547.582: (sockaddr_in *)0x280bcfd0 = { 2, 724, 209.198.112.169 }
security_handleinit(handle=0x28324080, driver=0x280ac020 (BSD))
amandad: time 5547.583: accept recv REQ pkt:
<<<<<
SERVICE sendbackup
OPTIONS features=ffffffff9ffeffffffff00;hostname=echo;config=DailySet1;
DUMP /usr  0 1970:1:1:0:0:0 OPTIONS |;auth=BSD;srvcomp-best;index;
 >>>>>
amandad: time 5547.583: sendbackup OPTIONS features=ffffffff9ffeffffffff00;hostname=echo;config=DailySet1;
DUMP /usr  0 1970:1:1:0:0:0 OPTIONS |;auth=BSD;srvcomp-best;index;
: already running, acking req
amandad: time 5547.583: sending ACK pkt:
<<<<<
 >>>>>
amandad: dgram_send_addr(addr=0xbfbfe990, dgram=0x280acfe4)
amandad: time 5547.583: (sockaddr_in *)0xbfbfe990 = { 2, 724, 209.198.112.169 }
amandad: dgram_send_addr: 0x280acfe4->socket = 0
security_close(handle=0x28324080, driver=0x280ac020 (BSD))
amandad: time 5607.675: dgram_recv(dgram=0x280acfe4, timeout=0, fromaddr=0x280bcfd0) amandad: time 5607.675: (sockaddr_in *)0x280bcfd0 = { 2, 724, 209.198.112.169 }
security_handleinit(handle=0x28324080, driver=0x280ac020 (BSD))
amandad: time 5607.676: accept recv REQ pkt:
<<<<<




Also, don't know if matters but my server is running version 2.6.0p2 from FreeBSD ports whereas my client is 2.5.1p3. Any help/suggestions are greatly appreciated.
Thanks again,
Jeremiah

Is the DLE that is having trouble, on a disk separate from the dles that backup ok? If it is you might check dmesg or your system logs for any hardware troubles.

HTH




--
Jim Summers
School of Computer Science-University of Oklahoma
-------------------------------------------------

<Prev in Thread] Current Thread [Next in Thread>