Amanda-Users

Re: Crashing machine

2003-09-19 14:49:39
Subject: Re: Crashing machine
From: Frank Smith <fsmith AT hoovers DOT com>
To: "Brashers, Bart -- MFG, Inc." <Bart.Brashers AT mfgenv DOT com>, "Amanda Users (E-mail)" <amanda-users AT amanda DOT org>
Date: Thu, 18 Sep 2003 17:23:58 -0500
--On Thursday, September 18, 2003 14:42:29 -0600 "Brashers, Bart -- MFG, Inc." 
<Bart.Brashers AT mfgenv DOT com> wrote:

> 
> I've been using amanda-2.4.2p2 for a long time now, without problems.  In
> the last week or so, my Linux (2.4.20) machine has been crashing, apparently
> when amanda runs.  I see in the various logs in /var/log when amanda (e.g.
> xinetd in /var/log/secure with user amanda, from 127.0.0.1) and then nothing
> until the restart the next morning when I restart the computer.  
> 
> The real kicker was just now when I ran amflush (after amcleanup) to flush
> the last failed dump to the disk.  The system panicked after just a few
> minutes, with the "Machine check exception (kernel panic: cpu context
> corrupt)" error.  That usually happens when the system is too hot, or you
> have a bad motherboard, or something.  This machine has been in operation
> for about 6 months, so it's probably not the MB.  It's not that hot in the
> room, and I checked that the fins on the CPU fan weren't clogged with dust.
> 
> Any ideas here?  Anyone heard of such a thing?  Am I barking up the wrong
> tree thinking that amanda might be responsible for my crashes?  It's a real
> pain, not being able to run stuff at night (and not having backups makes me
> nervous).

Such problems are usually heat-related and sometines a sign of bad RAM.
When you were checking for dust buildup on the CPU heatsink did you also
verify that all the fans were running normally, including the PS fan(s)?
  If all that checks out, try running memtest86 all day to check for
memory errors (you can get a bootable CD image of it to pop into a machine
whenever you suspect memory problems).
  After ruling out heat and memory, check the PS output while all the
disks are active (like when amanda is busy), if it is only marginal the
extra disk activity may be causing enough voltage drop to cause CPU or
RAM corruption.
  Rarest case is a failing CPU.  I've only seen one but it can happen.

Good luck,
Frank  
> 
> Bart
> --
> Bart Brashers, Ph.D.
> Air Quality Meteorologist
> MFG Inc.
> 19203 36th Ave W Suite 101
> Lynnwood WA 98036-5707
> 
> bart.brashers AT mfgenv DOT com
> Phone: 425.921.4000
> Fax:   425.921.4040



-- 
Frank Smith                                      fsmith AT hoovers DOT com
Systems Administrator                           Voice: 512-374-4673
Hoover's Online                                   Fax: 512-374-4501


<Prev in Thread] Current Thread [Next in Thread>