Re: Crashing machine
2003-09-18 17:38:38
On Thu, 2003-09-18 at 15:42, Brashers, Bart -- MFG, Inc. wrote:
> I've been using amanda-2.4.2p2 for a long time now, without problems. In
> the last week or so, my Linux (2.4.20) machine has been crashing, apparently
> when amanda runs. I see in the various logs in /var/log when amanda (e.g.
> xinetd in /var/log/secure with user amanda, from 127.0.0.1) and then nothing
> until the restart the next morning when I restart the computer.
Bummer. I had a situation once where my backups all of a sudden began
failing on large filesystems. Fortunately I caught a message in the log
files that pointed me to the NIC.
>
> The real kicker was just now when I ran amflush (after amcleanup) to flush
> the last failed dump to the disk. The system panicked after just a few
> minutes, with the "Machine check exception (kernel panic: cpu context
> corrupt)" error. That usually happens when the system is too hot, or you
> have a bad motherboard, or something. This machine has been in operation
> for about 6 months, so it's probably not the MB. It's not that hot in the
> room, and I checked that the fins on the CPU fan weren't clogged with dust.
>
Can you use lm_sensors to monitor the internal temps? It helped me find
a problem on a node in our cluster. The node would be humming along
fine then when it got a fairly CPU intensive job running on it, then bam
it would hang, no log messages either.
Hope this helps.
> Any ideas here? Anyone heard of such a thing? Am I barking up the wrong
> tree thinking that amanda might be responsible for my crashes? It's a real
> pain, not being able to run stuff at night (and not having backups makes me
> nervous).
>
> Bart
> --
> Bart Brashers, Ph.D.
> Air Quality Meteorologist
> MFG Inc.
> 19203 36th Ave W Suite 101
> Lynnwood WA 98036-5707
>
> bart.brashers AT mfgenv DOT com
> Phone: 425.921.4000
> Fax: 425.921.4040
--
Jim Summers <jsummers AT cs.ou DOT edu>
University of Oklahoma - Computer Science
|
|
|