Amanda-Users

Re: How to debug a sendbackup.

2009-05-28 18:21:59
Subject: Re: How to debug a sendbackup.
From: Nathan Stratton Treadway <nathanst AT ontko DOT com>
To: amanda-users AT amanda DOT org
Date: Thu, 28 May 2009 18:12:13 -0400
On Thu, May 28, 2009 at 13:46:58 -0400, McGraw, Robert P wrote:
> I need to run the commands that will reproduce this error. I need to see
> what is the unexpected field value. 
> 
> What is the input to the gtar line "sendbackup: time 0.396: started
> index creator: "/opt/csw/bin/gtar -tf - 2>/dev/null | sed -e 's/^\.//'"


Although the log file output is a little confusing, the snapshot error
message is actually not comming from the index creator process, but
rather from the "runtar" process. 

So, you should be able to produce the error manually by making a copy of 
/var/amanda/gnutar-lists/zorn_export_users-h_0 (e.g. to
zorn_export_users-h_1.test ) and then running:

tar --create --file - --directory /export/fssnap/users --one-file-system \
  --listed-incremental zorn_export_users-h_1.test \
  --sparse --ignore-failed-read --totals --files-from \
  /tmp/amanda/sendbackup._export_users-h.20090527111921.include

(The "snapshot file" in the error message is the one specified as an
argument to the --listed-incremental option.)

Unfortunately, doing this won't directly help you figure out which entry
in the snapshot file is the one causing the problem, because even when
you run it manually, tar doesn't print any more info that what is
showing up in the Amanda logs.

I ran into a similar situation myself, and after some investigation I
found that the problem was that the Linux version on that particular box
was returning an invalid value for the nanosecond portion of a
directory's modification time.  Unfortunately, tar performs some checks
on the ranges of values it reads out of the snapshot file, presumably to
validate that file... but it doesn't check the ranges of values as it's
writing the file.  So, if the kernel is passing tar "corrupt" data, tar
will happily write a snapshot file that it will then refuse to read.

Since you are running Solaris and not Linux you obviously aren't hitting
the exact problem I did, but off hand I'd guess that there is some
directory out there under /export/fssnap/users/h* which has some "stat"
value that tar doesn't like. 


When I had this problem, I posted a message to the bug-tar mailing list
discussing what what I found and suggesting a patch that might help
catch these situations.  I haven't heard anything back from the tar
maintainer about the topic, but if you are willing to compile your own
copy of gnu-tar from source (after applying my patch), those changes
might possibly help you track down exactly which directory is triggering
the error.

    http://www.archivum.info/bug-tar AT gnu DOT org/2009-03/msg00071.html

(If you do try this I will be very curious to hear what you find out.)


If you don't want to try compiling tar from source, then the next
question is how many /export/fssnap/users/h* directories you have.  If
you don't have too many, or you don't mind writing a loop that would
cycle through each of them, it might make sense to try doing a "tar ...
--listed-incremental" cycle for each directory individually until you
narrow down exactly which entry is causing the problem.


Or it occurs to me that I saw a perl script somewhere that parsed
through the tar snapshot files; it would probably be possible to modify
that script to do the same checks that tar does, and have it output the
directory name associated any the invalid values that it finds.  If you
want be to look into that further, let me know....


                                                Nathan


----------------------------------------------------------------------------
Nathan Stratton Treadway  -  nathanst AT ontko DOT com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Attachment: signature.asc
Description: Digital signature

<Prev in Thread] Current Thread [Next in Thread>