Amanda-Users

Re: GNUTAR hanging

2007-05-04 09:02:37
Subject: Re: GNUTAR hanging
From: René Kanters <rkanters AT richmond DOT edu>
To: anthonyh <jyho AT abamon DOT com>
Date: Fri, 4 May 2007 08:17:04 -0400
Hi Anthony,

I have been having issues of gnutar hanging also. In my case I think is has to do with hardware issues with the external Western Digital 1TB drive to which I do the backups.

I usually end up having to kill the process on the client, reboot the server and make sure that the HD is rebooted too. Then I need to relabel the slot that got messed up (so it will be reused when I manually start a new amdump). After the first time this happened I had a hard time getting it all to work and decided to switch from the firewire 800 bus to the firewire 400 interface which did not seem to affect the overall performance since I backup directly to disk without a holding disk which results in only one dumper/taper writing to the disk so that the speed of my backup is mainly determined by the time that the compression takes. Since then this has happened three or four times with about a week or so in between. Annoying since I physically have to be at the server to reboot it (remote rebooting doesn't even work at that stage) and pull the power of the external disk.

I am a bit surprised that amanda is not able to realize that it can not write to the disk anymore and doesn't call the whole thing off, cleanup up processed on clients too.

PS I am running 2.5.1p1 on a Mac server with RedHat WS4 clients.

René


On May 3, 2007, at 10:34 PM, anthonyh wrote:


Hi guys,

Any solution to the GNUTAR hanging issue yet? I'm having the same problem as
the tar problem runs indefinitely for some time until i killed it.

top - 10:31:19 up 5 days, 12:01, 1 user, load average: 1.11, 1.03, 1.01
Tasks:  44 total,   2 running,  41 sleeping,   0 stopped,   1 zombie
Cpu(s): 30.6% us, 69.4% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.0% si Mem: 125064k total, 122244k used, 2820k free, 77304k buffers Swap: 262072k total, 0k used, 262072k free, 14384k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
17812 root      25   0  1920  652  568 R 99.7  0.5   4732:18 tar
 4546 root      16   0  2888  868  724 R  0.3  0.7   0:00.03 top
    1 root      16   0  1952  460  392 S  0.0  0.4   0:01.09 init



# ps aux |grep tar
root     17812 99.5  0.5  1920  652 ?        R    May01 4732:26 gtar
--create --
file - --directory /boot --one-file-system --listed-incremental
/usr/local/var/a
manda/gnutar-lists/anthonyho.no-ip.org_boot_1.new --sparse
--ignore-failed-read
--totals .
root 4548 0.0 0.4 4820 572 pts/1 D 10:31 0:00 grep tar


Any workaround?

Regards,
Anthony



Chris Cameron wrote:

sendbackup:

sendbackup: debug 1 pid 28355 ruid 2 euid 2: start at Thu Mar 29
07:15:09 2007
sendbackup: version 2.5.1p3
Could not open conf file "/etc/amanda/amanda-client.conf": No such file
or directory
Could not open conf file "/etc/amanda/test/amanda-client.conf": No such
file or directory
sendbackup: debug 1 pid 28355 ruid 2 euid 2: rename at Thu Mar 29
07:15:09 2007
   sendbackup req: <GNUTAR /dev/raid0g  0 1970:1:1:0:0:0 OPTIONS
|;auth=BSD;compress-fast;encrypt-serv-cust=/opt/amanda/sbin/ amcrypt-ossl;server-decrypt-option=-d;index;>
   parsed request as: program `GNUTAR'
                      disk `/dev/raid0g'
                      device `/dev/raid0g'
                      level 0
                      since 1970:1:1:0:0:0
                      options
`|;auth=BSD;compress-fast;encrypt-serv-cust=/opt/amanda/sbin/ amcrypt-ossl;server-decrypt-option=-d;index;'
sendbackup: start: carp0:/dev/raid0g lev 0
sendbackup: time 0.041: spawning /usr/bin/gzip in pipeline
sendbackup: argument list: /usr/bin/gzip --fast
sendbackup-gnutar: time 0.046: pid 31136: /usr/bin/gzip --fast
sendbackup-gnutar: time 0.064: doing level 0 dump as listed- incremental
to '/var/amanda/gnutar-lists/carp0_dev_raid0g_0.new'
sendbackup-gnutar: time 0.069: doing level 0 dump from date: 1970-01-01
  0:00:00 GMT
sendbackup: time 0.072: spawning /usr/local/libexec/runtar in pipeline
sendbackup: argument list: runtar test gtar --create --file -
--directory /home --one-file-system --listed-incremental
/var/amanda/gnutar-lists/carp0_dev_raid0g_0.new --sparse
--ignore-failed-read --totals .
sendbackup-gnutar: time 0.129: /usr/local/libexec/runtar: pid 14139
sendbackup: time 0.130: started backup
sendbackup: time 0.139: started index creator: "/usr/local/bin/ gtar -tf
- 2>/dev/null | sed -e 's/^\.//'"


sendbackup:
-auth=bsd: debug 1 pid 9120 ruid 2 euid 2: start at Thu Mar 29 08:21:53
2007
security_getdriver(name=BSD) returns 0x42af4038
-auth=bsd: version 2.5.1p3
-auth=bsd: build: VERSION="Amanda-2.5.1p3"
-auth=bsd:        BUILT_DATE="Mon Mar 26 08:10:30 MDT 2007"
-auth=bsd:        BUILT_MACH="OpenBSD carp1.netthruput.com 4.0 CARP#0
sparc64"
-auth=bsd:        CC="gcc"
-auth=bsd: CONFIGURE_COMMAND="'./configure' '--prefix=/usr/ local'
'--sysconfdir=/etc' '--localstatedir=/var' '--without-server'
'--with-user=operator' '--with-group=operator'
'--with-gnutar=/usr/local/bin/gtar'"
-auth=bsd: paths: bindir="/usr/local/bin" sbindir="/usr/local/sbin"
-auth=bsd: libexecdir="/usr/local/libexec" mandir="/usr/ local/man" -auth=bsd: AMANDA_TMPDIR="/tmp/amanda" AMANDA_DBGDIR="/tmp/ amanda"
-auth=bsd:        CONFIG_DIR="/etc/amanda" DEV_PREFIX="/dev/"
-auth=bsd:        RDEV_PREFIX="/dev/r" DUMP="/sbin/dump"
-auth=bsd:        RESTORE="/sbin/restore" VDUMP=UNDEF VRESTORE=UNDEF
-auth=bsd:        XFSDUMP=UNDEF XFSRESTORE=UNDEF VXDUMP=UNDEF
VXRESTORE=UNDEF
-auth=bsd:        SAMBA_CLIENT=UNDEF GNUTAR="/usr/local/bin/gtar"
-auth=bsd:        COMPRESS_PATH="/usr/bin/gzip"
-auth=bsd: UNCOMPRESS_PATH="/usr/bin/gzip" LPRCMD="/usr/bin/ lpr"
-auth=bsd:        MAILER="/usr/bin/Mail"
-auth=bsd:        listed_incr_dir="/var/amanda/gnutar-lists"
-auth=bsd: defs:  DEFAULT_SERVER="carp1.netthruput.com"
-auth=bsd:        DEFAULT_CONFIG="DailySet1"
-auth=bsd: DEFAULT_TAPE_SERVER="carp1.netthruput.com" HAVE_MMAP
-auth=bsd:        HAVE_SYSVSHM LOCKING=POSIX_FCNTL DEBUG_CODE
-auth=bsd:        AMANDA_DEBUG_DAYS=4 BSD_SECURITY RSH_SECURITY
USE_AMANDAHOSTS
-auth=bsd:        CLIENT_LOGIN="operator" FORCE_USERID HAVE_GZIP
-auth=bsd:        COMPRESS_SUFFIX=".gz" COMPRESS_FAST_OPT="--fast"
-auth=bsd:        COMPRESS_BEST_OPT="--best" UNCOMPRESS_OPT="-dc"
-auth=bsd: time 0.001: dgram_recv(dgram=0x42ef8960, timeout=0,
fromaddr=0x42f08958)
-auth=bsd: time 0.001: (sockaddr_in *)0x42f08958 = { 2, 906,
192.168.120.50 }
security_handleinit(handle=0x45a96000, driver=0x42af4038 (BSD))
-auth=bsd: time 0.006: accept recv REQ pkt:
<<<<<
SERVICE noop
OPTIONS features=fffffeff9ffeffffff7f;

-auth=bsd: time 0.007: creating new service: noop
OPTIONS features=fffffeff9ffeffffff7f;

-auth=bsd: time 0.012: sending ACK pkt:
<<<<<

-auth=bsd: dgram_send_addr(addr=0xfffffffffffee1b0, dgram=0x42ef8960)
-auth=bsd: time 0.013: (sockaddr_in *)0xfffffffffffee1b0 = { 2, 906,
192.168.120.50 }
-auth=bsd: dgram_send_addr: 0x42ef8960->socket = 0
-auth=bsd: time 0.056: sending REP pkt:
<<<<<
OPTIONS features=fffffeff9ffeffffff7f;

-auth=bsd: dgram_send_addr(addr=0xfffffffffffee340, dgram=0x42ef8960)
-auth=bsd: time 0.056: (sockaddr_in *)0xfffffffffffee340 = { 2, 906,
192.168.120.50 }
-auth=bsd: dgram_send_addr: 0x42ef8960->socket = 0
-auth=bsd: time 0.057: dgram_recv(dgram=0x42ef8960, timeout=0,
fromaddr=0x42f08958)
-auth=bsd: time 0.057: (sockaddr_in *)0x42f08958 = { 2, 906,
192.168.120.50 }
-auth=bsd: time 0.057: received ACK pkt:
<<<<<

security_close(handle=0x45a96000, driver=0x42af4038 (BSD))
-auth=bsd: time 0.062: dgram_recv(dgram=0x42ef8960, timeout=0,
fromaddr=0x42f08958)
-auth=bsd: time 0.062: (sockaddr_in *)0x42f08958 = { 2, 906,
192.168.120.50 }
security_handleinit(handle=0x47c62000, driver=0x42af4038 (BSD))
-auth=bsd: time 0.064: accept recv REQ pkt:
<<<<<
SERVICE sendsize
OPTIONS
features=fffffeff9ffeffffff7f;maxdumps=1;hostname=carp1;config=test;
GNUTAR /dev/raid0g  0 1970:1:1:0:0:0 -1  OPTIONS
|;auth=BSD;compress-fast;encrypt-serv-cust=/opt/amanda/sbin/ amcrypt-ossl;server-decrypt-option=-d;index;

-auth=bsd: time 0.064: creating new service: sendsize
OPTIONS
features=fffffeff9ffeffffff7f;maxdumps=1;hostname=carp1;config=test;
GNUTAR /dev/raid0g  0 1970:1:1:0:0:0 -1  OPTIONS
|;auth=BSD;compress-fast;encrypt-serv-cust=/opt/amanda/sbin/ amcrypt-ossl;server-decrypt-option=-d;index;

-auth=bsd: time 0.071: sending ACK pkt:
<<<<<

-auth=bsd: dgram_send_addr(addr=0xfffffffffffee1b0, dgram=0x42ef8960)
-auth=bsd: time 0.072: (sockaddr_in *)0xfffffffffffee1b0 = { 2, 906,
192.168.120.50 }
-auth=bsd: dgram_send_addr: 0x42ef8960->socket = 0
-auth=bsd: time 0.136: sending PREP pkt:
<<<<<
OPTIONS features=fffffeff9ffeffffff7f;

-auth=bsd: dgram_send_addr(addr=0xfffffffffffee320, dgram=0x42ef8960)
-auth=bsd: time 0.137: (sockaddr_in *)0xfffffffffffee320 = { 2, 906,
192.168.120.50 }
-auth=bsd: dgram_send_addr: 0x42ef8960->socket = 0
-auth=bsd: time 0.169: sending REP pkt:
<<<<<
OPTIONS features=fffffeff9ffeffffff7f;

-auth=bsd: dgram_send_addr(addr=0xfffffffffffee340, dgram=0x42ef8960)
-auth=bsd: time 0.169: (sockaddr_in *)0xfffffffffffee340 = { 2, 906,
192.168.120.50 }
-auth=bsd: dgram_send_addr: 0x42ef8960->socket = 0
-auth=bsd: time 0.169: dgram_recv(dgram=0x42ef8960, timeout=0,
fromaddr=0x42f08958)
-auth=bsd: time 0.169: (sockaddr_in *)0x42f08958 = { 2, 906,
192.168.120.50 }
-auth=bsd: time 0.169: received ACK pkt:
<<<<<

security_close(handle=0x47c62000, driver=0x42af4038 (BSD))
-auth=bsd: time 30.184: pid 9120 finish time Thu Mar 29 08:22:24 2007



Thanks,
Chris


Jean-Louis Martineau wrote:
The 'Illegel seek' is not a problem.
The sendsize is working correctly, the problem is with sendbackup

Could you post the sendbackup and amandad debug files?

Jean-Louis

Chris Cameron wrote:
I have Amanda 2.5.1p3 running on Solaris 8 for the server, and the
client is the same version running on OpenBSD 4.0.

Using GNUTAR, the dump times out. The error on the client machine is:

sendsize[3860]: time 0.058: waiting for any estimate child: 1 running
sendsize[28668]: time 0.061: calculating for amname /dev/raid0g,
dirname /home, spindle -1
sendsize[28668]: time 0.063: getting size via gnutar for /dev/ raid0g
level 0
sendsize[28668]: time 0.094: spawning /usr/local/libexec/runtar in
pipeline
sendsize[28668]: argument list: runtar test /usr/local/bin/gtar
--create --file /dev/null --directory /home --one-file-system
--listed-incremental /var/amanda/gnutar-lists/ carp0_dev_raid0g_0.new
--sparse --ignore-failed-read --totals .
sendsize[28668]: time 0.337: Total bytes written: 15718400 (15MiB,
124MiB/s)
sendsize[28668]: time 0.342: .....
sendsize[28668]: estimate time for /dev/raid0g level 0: 0.248
sendsize[28668]: estimate size for /dev/raid0g level 0: 15350 KB
sendsize[28668]: time 0.342: waiting for runtar "/dev/raid0g" child
sendsize[28668]: time 0.347: after runtar /dev/raid0g wait
gnutar_calc_estimates: warning - seek failed: Illegal seek
sendsize[28668]: time 0.365: done with amname /dev/raid0g dirname
/home spindle -1
sendsize[3860]: time 0.370: child 28668 terminated normally



The tar part seems to work, as I can also run it just at a command
line without error. However the message:

warning - seek failed: Illegal seek

Doesn't seem good.


In the report email:

  carp0  /dev/raid0g     lev 0  FAILED [data timeout]
  carp0  /dev/raid0g     lev 0  FAILED [too many dumper retry:
"[request failed: timeout waiting for ACK]"]
  carp0  /dev/raid0g     lev 0  FAILED [cannot read header: got 0
instead of 32768]


When amdump gives up, it leaves the following processes on the client
machine:

operator 30460  0.0  0.2   968  2200 ??  Is     9:09AM    0:00.13
amandad
operator 10245  0.0  0.2   816  2240 ??  I      9:10AM    0:00.08
/usr/local/libexec/sendbackup amandad bsd
operator 15664  0.0  0.1   824  1528 ??  I      9:10AM    0:00.04
/usr/bin/gzip --fast
operator 31389  0.0  0.1   816  1392 ??  I      9:10AM    0:00.02
/usr/local/libexec/sendbackup amandad bsd
operator 22560 0.0 0.1 768 576 ?? I 9:10AM 0:00.01 sh
-c sh
operator 28437 0.0 0.1 432 1192 ?? I 9:10AM 0:00.05 sed
-e s/^\\.//
operator 18449  0.0  0.1   792   648 ??  I      9:10AM    0:00.03
/usr/local/bin/gtar -tf -


Watching tcpdump I don't see any obvious retrying going on. I have a whack of other Solaris 8 machines that don't have any problem with tar
(1.16)


Anyone know what I've done wrong here?


Thanks,
Chris





--
View this message in context: http://www.nabble.com/GNUTAR-hanging- tf3473915.html#a10316228
Sent from the Amanda - Users mailing list archive at Nabble.com.




<Prev in Thread] Current Thread [Next in Thread>