BackupPC-users

Re: [BackupPC-users] BackupPC working for a year, all of a sudden nothing works

2009-01-03 19:14:25
Subject: Re: [BackupPC-users] BackupPC working for a year, all of a sudden nothing works
From: Holger Parplies <wbppc AT parplies DOT de>
To: Gene Horodecki <geneh AT shaw DOT ca>, Juergen Harms <Juergen.Harms AT unige DOT ch>
Date: Sun, 4 Jan 2009 01:12:31 +0100
Hi,

Gene Horodecki wrote on 2009-01-03 15:01:54 -0600 [Re: [BackupPC-users] 
BackupPC working for a year, all of a sudden nothing works]:
> [Faulty RAM, bad PSU]
> These are good theories, but I would tend to refute them in both cases.  
> First of all I now know that the system is behaving in exactly the same 
> way for the old and the new memory.

meaning exactly what? The old memory modules in the same slots with the same
memory timings as before the problems started? Even then, you *can* damage a
main board (or memory module) when handling it (electrostatic discharge,
applying too much pressure to the board, touching the connectors ...). You
give some sound proof that it is, in fact, a hardware problem and not a
software problem: 

> About an hour into a backup today I got a spontaneous reboot.

The only piece of software that could be responsible would be the kernel, and
that seems rather unlikely. An OOM situation (potentially caused by rsync)
should look different.

> This is with the original memory.  In most 
> logs I can see this reboot happening at 14:36:18 today.  I see nothing 
> in any log in the time immediately before this to indicate a problem.  
> I've checked messages, syslog, kern.log, dmesg, and daemon.log... I see 
> nothing just before that.

This is consistent with a memory or CPU problem (or PSU). I take it you mean
"reboot" as in "crash", i.e. no clean system shutdown?

> I will do the load test however and tell you how it goes.. I can get a 
> multimeter.

That appears to be a good idea.

I would also recommend keeping an eye on your CPU temperature, whether by
means of lm-sensors or by looking at the BIOS readings immediately after a
crash as a last resort. You might also be able to activate a sound alarm in
the BIOS setup.

Juergen Harms wrote on 2009-01-03 22:38:52 +0100 [Re: [BackupPC-users] BackupPC 
working for a year, all of a sudden nothing works]:
> It looks like the "educated guesses" do not help in this case.

So we should resort to "uneducated guesses"?

> Something 
> breaks without leaving an evident trace at the level of backuppc and of 
> Linux - you need more evidence (I agree, after having eliminated the 
> most likely explanations for such a diffusely manifested error).

Which are hardware problems (memory, main board, CPU, PSU, short circuits, eg.
connections between MB and casing; not necessarily in order of probability).

> The next step I would attempt (and which has already been suggested) is 
> to try and get more direct feedback from rsync:

Fine, your choice. I would not recommend that to anyone else, though. It's
quite frustrating to try to confirm or rule out a hardware problem by looking
at the output of rsync, especially when you've already gone to the trouble of
tracing the most likely trigger down to ssh.

If you want to run a stress-test on your CPU and memory subsystem, try
something like Folding@Home or seti@home. If these reproducibly crash your
system (or just crash without taking the system down), you can rule out
ssh/rsync as cause (but we already know that, so you gain nothing). If they
don't, you again gain nothing. Well, you downloaded Folding@Home, so you might
as well keep running it :-).


I agree that hardware problems are no fun to trace down. But that doesn't mean
that you should rather look for software bugs when you are faced with them.

Regards,
Holger

------------------------------------------------------------------------------
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/