BackupPC-users

Re: [BackupPC-users] BackupPC spontaneous server exit with SIGPIPE

2011-08-12 11:49:12
Subject: Re: [BackupPC-users] BackupPC spontaneous server exit with SIGPIPE
From: Holger Parplies <wbppc AT parplies DOT de>
To: Carl Wilhelm Soderstrom <chrome AT real-time DOT com>
Date: Fri, 12 Aug 2011 17:46:08 +0200
Hi,

Carl Wilhelm Soderstrom wrote on 2011-08-11 13:16:38 -0500 [[BackupPC-users] 
BackupPC spontaneous server exit with SIGPIPE]:
> BackupPC 3.1.0 on Debian, kernel is 2.6.32, filesystem is xfs.
> 
> I have a BackupPC server that has twice now had the BackupPC server process
> spontaneously exit. It *did* shut down nicely, interestingly enough. The
> server log (/var/lib/backuppc/log/LOG) is this:
> [...]
> 2011-08-11 09:00:01 Next wakeup is 2011-08-11 10:00:00
> 2011-08-11 09:51:47 Got signal PIPE... cleaning up

as you probably know SIGPIPE means that the server is trying to write to a
socket that has been closed by the peer. BackupPC doesn't expect that
situation and doesn't attempt to handle it, because it only ever replies to
messages just received (from so-called "clients"; it doesn't seem to ever
write to backup jobs!) and new incoming connections, and the software speaking
to the daemon (BackupPC_serverMesg) doesn't close the socket without waiting
for a reply. I suppose it would be easy to implement a malicious "client" that
causes the server to crash, and apparently some error condition can also lead
to this.

> It looks like there's no jobs running when the server exits; but obvously
> when it gets restarted (by me) it runs a BackupPC_link job on the host it
> was working on before it exited... so it's like the host 'hung' on doing
> something in the backup, exited (timeout of some sort?), then when restarted
> resumes where it left off and finishes all parts of the backup successfully.

The strange thing is that it doesn't seem to start BackupPC_link before the
crash. The daemon logs "Finished full backup on host1.example.com" when it
receives the corresponding message from BackupPC_dump. It closes the socket
whenever it detects an EOF on it (which would usually be immediately
afterwards, but there is really no way to tell from the log file) and then
queues a BackupPC_link if BackupPC_dump previously told it to. So, it would
appear that for some reason the socket isn't closed.

> This isn't the per-host LOG (/var/lib/backuppc/pc/host1.example.com/LOG) or
> XferLOG (/var/lib/backuppc/pc/host1.example.com/LOG), so it's not obvious
> how to turn up the debug level or otherwise try to figure out what's going
> on here.

I would insert debugging output into the daemon's code. You could try to add

        print(LOG $bpc->timeStamp, qq{About to reply "$reply" to command "$cmd"
from $Clients{$client}{clientName}\n});

before line 1515 of BackupPC (syswrite),

        print(LOG $bpc->timeStamp, "About to send seed to UNIX socket\n");

before line 1557 (also syswrite), and

        print(LOG $bpc->timeStamp, qq{About to send seed to "$name:$port"\n});

before line 1578 (you'll have guessed, syswrite again). You might have noticed
that I left out the syswrite in line 1500. I just don't think it's that one.
[Line numbers are for the upstream tarball, I didn't check the Debian package.
It's the second through fourth 'syswrite' occurrence.]

Don't forget to restart BackupPC after changing the code ;-).

> It's happened twice, but I've never seen this host do it before. There's
> nothing in dmesg or syslog to indicate a problem. Only thing I can think of
> is that BackupPC is running into some sort of filesystem corruption that is
> causing it to fail out rather than risk corrupting it further.

It seems to be in some way related to a hanging backup job, but I can't see
how the job itself could cause it. I don't see any code meant to "crash"
BackupPC in the event of filesystem corruption, but then, I'm not looking for
it (yet) :-).

> Upgrading this host to BackupPC 3.2.1 may be possible (need to get buy-in
> from some other admins before doing that); but before I do that
> I'm going to fsck the disks. I've never seen BackupPC do this before on any
> of the installations I admin and it's done it twice in a few days, so I'm
> suspecting hardware problems.

I'm suspecting something weird :-). For the moment, I don't see any better
option than trying to reproduce it with debugging output in place.

Of course, it could also be something sending a gratuitous SIGPIPE to BackupPC.
Does anyone know of any circumstances where a SIGPIPE would be sent to a
process *group*?

Regards,
Holger

------------------------------------------------------------------------------
FREE DOWNLOAD - uberSVN with Social Coding for Subversion.
Subversion made easy with a complete admin console. Easy 
to use, easy to manage, easy to install, easy to extend. 
Get a Free download of the new open ALM Subversion platform now.
http://p.sf.net/sfu/wandisco-dev2dev
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

<Prev in Thread] Current Thread [Next in Thread>