Re: [BackupPC-users] A problem really hard to handle : BackupPc Crashing

Hi,

Les Mikesell wrote on 2009-05-20 08:36:49 -0500 [Re: [BackupPC-users] A problem 
really hard to handle : BackupPc Crashing]:
> SebClight wrote:
> > As this being my first post, I'll maybe do it wrong :P

well, since you're asking, Backup Central seems to have managed to mess up your
log file quote - the times are "really hard to handle".

Also, your subject line should describe the problem, not the fact that you
think it is hard to debug (or important or whatever).

Aside from that, I've seen worse posts from some regular participants :).

> > I have a BackupPc instance (3.1.0) on Debian Lenny. It randomly crashes
> > but mainly during night hours. I have to restart the service once a day
> > and sometimes more. I have more than 40 hosts to backup (most of them are
> > just websites so they're not very big).
> > 
> > See some samples of the logs :
> > 
> > [...]
> > 2009-05-18 20&#58;37&#58;39 Started incr backup on Website02 
> > &#40;pid=18496, share=Websites&#41;
> > 2009-05-18 20&#58;46&#58;08 Got signal PIPE... cleaning up

This one was in the middle of a backup.

> > [...]
> > See ? It crashed at 29h46 right after the "Got signal PIPE... cleaning up"
> > command.

Well, at that time of day, I'd crash too ;-).

> > [...]
> > 2009-05-20 00&#58;25&#58;01 Started full backup on website05 
> > &#40;pid=15173, share=Websies&#41;
> > 2009-05-20 00&#58;25&#58;06 Finished full backup on website05

Possibly this one is completing rather fast because you've misspellt the share
name?

> > I think this is quite a challenging problem...

I think it's quite an annoying problem, because it's almost certainly outside
BackupPC (hardware, broken binaries, system configuration ...).

> > As you can see too, backups are running quite normally. I can backup hosts
> > or dirs manually on the web interface and backups seem to run normally as
> > well but when it gets this "Got signal PIPE... cleaning up" signal, it
> > crashes.

Strictly speaking, it's not a crash. BackupPC terminates cleanly after
encountering an unexpected situation. Whether this response is correct in this
situation is a different matter. I'd argue for just closing the socket
responsible, though I'm not sure how I'd implement that. It probably means
making all output to sockets event-driven ...

> > Any ideas ?
> 
> Just a wild guess, but linux will kill process more or less at random if 
> you run out of ram and swap space - and rsync can use a lot of memory if 
> there are many files on the target.   The PIPE signal just tells you 
> that a child process died when reading/writing to it, so that's not much 
> to go on.

I believe it is actually only on writing to a socket closed on the other end
that you get a SIGPIPE. Reading should simply give you an EOF condition. There
are only two possibilities I can think of how this could happen:

1.) Someone sent a *command* to the server (see Main_Check_Client_Messages in
    BackupPC) and didn't wait for a response. This should not happen. Corrupt
    script or client dying unexpectedly, maybe due to the system running out
    of swap (in a very narrow time frame, though).
2.) Someone initiated a connection to the server and closed it (or died) right
    away, before the server had time to write a seed to the connection.
    This should also not happen.

    Note that this gives any unauthenticated attacker with the ability to
    open a connection to the BackupPC server a DOS attack vector (I don't
    think that is happening, I just think it needs to be fixed).

I think it seems unlikely that an OOM condition should always lead to a child
process being killed (at an unlikely point in time), never the BackupPC server
itself or other important system processes. Moreover, it didn't sound like
backups with many files on the target ("most of them are just websites so
they're not very big"), and the share names don't sound like rsync (though it
could be rsyncd).

What I'd try:
- 'apt-get install --reinstall backuppc'
  Make sure the package is re-downloaded (though a corrupt package should fail
  the MD5 checks).
- alternatively, check the installed package for corruption:
  'debsums -c backuppc'
- likewise, check the Perl dependencies
  'debsums -c perl perl-modules perl-base' (and possibly others I've missed)
- test system memory
- add debugging messages to the BackupPC daemon to see where the SIGPIPE is
  triggered; try to find out which script is causing it
- move BackupPC_nightly to a different time (see WakeupSchedule in the config
  file) to see if the "crashes" correlate with BackupPC_nightly execution time

Hope that helps.

Regards,
Holger

------------------------------------------------------------------------------
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables 
unlimited royalty-free distribution of the report engine 
for externally facing server and web deployment. 
http://p.sf.net/sfu/businessobjects
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/