Amanda-Users

Re: gnutar version -- exactly 1.13.25 or just 1.13.25 and above?

2003-12-16 13:27:57
Subject: Re: gnutar version -- exactly 1.13.25 or just 1.13.25 and above?
From: Gene Heskett <gene.heskett AT verizon DOT net>
To: Mark_Conty AT cargill DOT com, amanda-users AT amanda DOT org
Date: Tue, 16 Dec 2003 13:24:56 -0500
On Tuesday 16 December 2003 13:11, Mark_Conty AT cargill DOT com wrote:
>Jon LaBadie writes:
>> On Fri, Nov 14, 2003 at 07:15:07PM +0100, Zoltan Kato wrote:
>> > /home is not NFS mounted and the directories are not transient
>> > (they
>
>are
>
>> > the actual home dirs of individual users). runtar is seduid
>> > root. I
>
>tryed
>
>> > to run the gtar command from the log file manually as root, and
>
>found that
>
>> > it ONLY works when I run it from /home:
>> >
>> > root@rozi$ cd /home/
>> > root@rozi$ /opt/sfw/bin/gtar --create --file /dev/null
>> > --directory
>
>/home
>
>>     This is NOT related to the problem you are seeing.
>>
>> However note that the version of gnutar that Sun supplies
>> to install in /opt/sfw/bin/gtar, or /usr/sfw/bin/gtar in
>> later releases, is not a suitable version for use with
>> amanda.  It is only 1.13 and what is needed is 1.13.25.
>
>Question:  Is it to be understood that versions of gnutar _greater_
> than 1.13.25 are also incompatible?  Or is it implied that those
> are also valid & acceptable for use with Amanda?
>
>I ask because I'm seeing the same problem as is Mr. Kato, only I'm
> on an HP-UX server, running gnutar 1.13.90 and Amanda 2.4.4p1. 
> chg-scsi is dumping to one of four tape drives in a 4/48 DLT
> library.

Thats an excellent question and AFAIK, the first time its been asked. 
I was not even aware there was an even newer version extant.

>There is this one DLE that gets to a certain point and then just
> sits there.  I am going to go back and look through past logs to
> see if it occurs at the same point in time.  Failing that, I'll
> extract the portion of the dumpfile that is actually on the tape,
> and skip to the end of it to see if the last file in the dump might
> somehow be causing the hang.
>
>Looking at the "amstatus" output, it's already written about 600mb
> to the tape; the DLE in question takes up about 2.9gb.
>
>This DLE runs on the Amanda tape server; I have configured most of
> the DLEs from the tape server to be "nohold", since most of those
> filesystems live in the same disk subsystem as the holding area. 
> This DLE deals with a filesystem that could hold as much as 50gb,
> so I broke into two DLEs (as I can only get about 30gb on each
> tape):
>
>s2sme /deploy/lynx/packages /deploy/lynx/packages {
>        user-tar-nohold
>        exclude "./[0-9]*"
>}
>s2sme /deploy/lynx/packages/09 /deploy/lynx/packages {
>        user-tar-nohold
>        include "./[0-9]*"
>}
>
>This turns out to be a pretty even split of what could be as much as
>25gb per DLE.
>
>Last night, though, the first DLE only needed a level 1, so it was
> only 20kb; would have been 230mb if a level 0.  The second DLE's
> level 0 shows as 2.9gb in the 'amstatus' output, but the resulting
> dump file (see below) turned out to be only 1.2gb.
>
>BTW, etimeout is set to 7200 seconds, which has long since elapsed.
>Checking the status of one of the taper processes with 'lsof', I see
>that the size/offset value is unchanged from when I checked it at
> 5am today.  Nor have the size/offset values for the associated
> dumper and sendbackup processes changed, either.
>
>The tail end of the amdump file (which last changed at 22:29 last
> night) has this:
>
>driver: send-cmd time 1766.131 to taper: PORT-WRITE 00-00083 s2sme
>fffffeff9ffe0f /deploy/lynx/packages/09 0 20031215
>taper: try_socksize: receive buffer size is 65536
>taper: stream_server: waiting for connection: 0.0.0.0.63925
>driver: result time 1766.136 from taper: PORT 63925
>driver: send-cmd time 1766.136 to dumper0: PORT-DUMP 01-00084 63925
>s2sme fffffeff9ffe0f /deploy/lynx/packages/09 /deploy/lynx/packages
> 0 1970:1:1:0:0:0 GNUTAR |;auth=bsd;index;include-file=./[0-9]*;
> driver: state time 1766.137 free kps: 9969 space: 0 taper: writing
> idle-dumpers: 3 qlen tapeq: 0 runq: 5 roomq: 0 wakeup: 86400
> driver-idle: not-idle
>driver: interface-state time 1766.137 if : free 9969
>driver: hdisk-state time 1766.137
>taper: stream_accept: connection from 127.0.0.1.63926
>taper: try_socksize: receive buffer size is 32768
>dumper: stream_client: connected to 127.0.0.1.63925
>dumper: stream_client: our side is 0.0.0.0.63926
>dumper: try_socksize: send buffer size is 65536
>dumper: stream_client: connected to 10.2.227.75.63927
>dumper: stream_client: our side is 0.0.0.0.63930
>dumper: stream_client: connected to 10.2.227.75.63928
>dumper: stream_client: our side is 0.0.0.0.63931
>dumper: stream_client: connected to 10.2.227.75.63929
>dumper: stream_client: our side is 0.0.0.0.63932
>dumper: pid 4426 receive size is 65535, low water is 32768
>
>Unless there's something there right under my nose, I don't see
> anything foreboding or otherwise problematic there.
>
>Looking at the log file, I don't see any failures, warnings, nor
> errors there, either.  The last entry is the success msg for the
> preceding DLE.
>
>The corresponding /tmp/amanda/runtar.*.debug file simply has:
>
>runtar: debug 1 pid 8515 ruid 111 euid 0: start at Mon Dec 15
> 22:29:32 2003
>gtar: version 2.4.4p1
>running: /opt/gnu/bin/tar: gtar --create --file - --directory
>/deploy/lynx/packa
>ges --one-file-system --listed-incremental
>/var/opt/amanda/gnutar-lists/s2sme_de
>ploy_lynx_packages_09_0.new --sparse --ignore-failed-read --totals
>--files-from
>/tmp/amanda/sendbackup._deploy_lynx_packages_09.20031215222930.inclu
>de
>
>... and the output in /tmp/amanda/sendsize.*.debug shows that it ran
>fine.
>
>I ran the runtar command by hand and piped the output to 'tar -tvf
> -'. It listed all the files I expected to see, and finished going
> through the ~3gb in under 3 minutes.  Granted, this was much
> faster, as it was through a pipe instead of going to the tape
> device, but it *did* finish, rather than just sitting there! 
> *sigh*
>
>The corresponding chg-scsi.* file holds no error messages nor
> warnings, either.
>
>I ran the runtar command again and this time sent it into a file. 
> It took only about 6 minutes for 1.2gb.  If I run flat out of
> ideas, I might try dumping this file to a scratch tape, to see if
> maybe *that* hangs mysteriously.
>
>But I don't know where else to look to find out just which process
> is hung.  Does anyone have any ideas?
>
>Also, does anyone know if there is some way to just _nudge_ an
> Amanda process, if it's locked up, so that it gives up on the
> current DLE and moves on to the next one??  I've tried sending
> SIGALRM and SIGHUP to the sendbackup, dumper, and taper processes
> (at different times! :), but that just stopped them, rather than
> making them skip to the next DLE. Checking in the source, I find
> that there are no references to SIGHUP in /client-src/ nor
> /server-src/, and the only SIGARLM reference is in
> client-src/killpgrp.c.
>
>So, any ideas, folks?  Thanks!
>-- Mark
>
>PS -- I realize that I didn't include my config files, but I tried
> to provide all the necessary info above.  If not, let me know and
> I'll pass along the salient portions of the config files, too.  Tnx
> again!

-- 
Cheers, Gene
AMD K6-III@500mhz 320M
Athlon1600XP@1400mhz  512M
99.22% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attornies please note, additions to this message
by Gene Heskett are:
Copyright 2003 by Maurice Eugene Heskett, all rights reserved.