Amanda-Users

Re: Diagnosing an elusive fault on a critical system [long]

2002-08-19 15:59:00
Subject: Re: Diagnosing an elusive fault on a critical system [long]
From: "Edwin Hakkennes" <edwin-list AT xic DOT nl>
To: Jonathan Johnson <Jonathan.Johnson AT MinnetonkaSoftware DOT com>
Date: Mon, 19 Aug 2002 21:41:53 +0200
Jonathan Johnson wrote:

> Dear Amanda users,
>
> Life with Amanda is swell.  The RH 7.2 localhost, an NT server and two
> RH 6.0 systems (one with strict ipchains firewalling) are all being
>

<SNIP>

>   Aug 10 05:04:19 pegasus sendbackup[9944]: error [/bin/tar got signal 11, 
> index got signal 11, compress got signal 11]

<SNIP>

>
>
> Thanks in advance, especially if you actually read this far!!  Only a
> true Linux fan would have stayed awake to this, the 390th line of this
> message.  :)
>
> Regards,
>
>   Jonathan
>
> --
>  /       Jonathan R. Johnson       | "Every word of God is flawless." \
>  |    Minnetonka Software, Inc.    |                 -- Proverbs 30:5 |
>  \ johnsonj AT MinnetonkaSoftware DOT com |  My own words only speak for me. /

Hi  Jonathan,

I'm certainly no expert on this, but sig-11 errors are mostly triggered by 
flaky RAM.
More specifically, the combination of Ram and the Memory interface of your 
mainboard.
Try reseating your RAM, or remove half of it and lateron run with the other 
half.

Another try would be to swap the mainboard with a totally different one.
'Flush' the mainboard, CPU and RAM and keep the rest of the components.

Removing the case can makes things worse, as there is no forced airflow anymore.

Another diagnosis you might try is to compile kernels just to keep your CPU 
busy. something like
foreach i ( 0 1 2 3 4 5  6 7 8 9)
  foreach j (0 1 2 3 4 5 6 7 8 9)
    make clean; make bzImage > log.$i$j
  end
end
And compare the logs afterwards. They should be identical. If not, you have a 
high chance of memory-errors.

Suggested reading: Look for the SIG11 faq by Rogier Wolff  (Roger Wolff). 
Google shows:

http://www.bitwizard.nl/sig11/

It might seem old, but it is still valid.

Good luck keeping Linux up!

Cheers,

Edwin Hakkennes