Amanda-Users

Re: y didn't amanda report this as an error?

2003-09-24 18:33:43
Subject: Re: y didn't amanda report this as an error?
From: Deb Baddorf <baddorf AT fnal DOT gov>
To: amanda-users AT amanda DOT org, amanda-users AT amanda DOT org
Date: Wed, 24 Sep 2003 17:30:59 -0500
At 03:36 PM 9/24/2003 -0400, Jon LaBadie wrote:
On Wed, Sep 24, 2003 at 01:54:49PM -0500, Deb Baddorf wrote:
> From a client  machine,  the admin sent me this:
>
> Sep 24 02:45:32 daesrv /kernel: pid 7638 (gzip), uid 2: exited on signal 11
> (core dumped)
>
> The above message shows gzip crashed on daesrv last night.  It crashed
> because there is a hardware problem on that machine, but since it was
> probably part of an amanda backup that did not work as expected, I wanted
> to be sure amanda had reported something about it to you.   -client admin
>
> Amanda herself had reported a strange error in her mail report:
>
> daesrv.fna /usr lev 0 STRANGE
> .....
> | DUMP: 33.76% done, finished in 1:20
> ? sendbackup: index tee cannot write [Broken pipe]

Note the problem was in making the index, not the backup.

Welllll .... but the client was doing it's own compressing.   So when the
gzipper failed,  the whole backup failed.   At only 33% finished.
I just did a test amrestore  (true,  amrecover wouldn't touch it).
Got about 1/3 the amount of data that ought to be on that disk.
So I think it really did fail,   but registered it as a successful level 0
backup.  :-(


> | DUMP: Broken pipe
> | DUMP: The ENTIRE dump is aborted.
> ? index returned 1
> ??error [/sbin/dump returned 3, compress got signal 11]? dumper: strange
> [missing size line from sendbackup]
> ? dumper: strange [missing end line from sendbackup]
> \--------
>
>
> But it appears that she went ahead and stored the partial data on tape
> anyway,   and considered this a good level 0 backup.   (admin config due
> shows the next level 0 is 7 days away)
>
> daesrv.fnal.gov /usr 0 0 3605024 -- 47:40 1260.7 12:35 4773.9
>
> Why doesn't amanda recognize this as a failure?
> Am I missing something that I should have noticed?
> Or am I reading it wrong (the fact that "due" implies a level 0 was done)?

Did your report show it was "taped".  If so I suspect the backup is ok,
but using amrecover with the index will be suspect/problematical.

--
Jon H. LaBadie                  jon AT jgcomp DOT com
 JG Computing
 4455 Province Line Road        (609) 252-0159
 Princeton, NJ  08540-4322      (609) 683-7220 (fax)