Bacula-users

Re: [Bacula-users] Fatal error: backup.c:892 Network send error to SD. ERR=Connection reset by peer

2010-04-16 09:38:11
Subject: Re: [Bacula-users] Fatal error: backup.c:892 Network send error to SD. ERR=Connection reset by peer
From: Matija Nalis <mnalis+bacula AT CARNet DOT hr>
To: Jon Schewe <jpschewe AT mtu DOT net>
Date: Fri, 16 Apr 2010 15:30:13 +0200
On Mon, Apr 12, 2010 at 03:59:49PM -0500, Jon Schewe wrote:
> On 4/12/10 9:40 AM, Matija Nalis wrote:
> > It is especially problem with bigger databases and MySQL instead of
> > PostgreSQL, see http://bugs.bacula.org/view.php?id=1472, where it can
> > take even several hours! (note that while it talks about "restore"
> > speed, it is also related to accurate backups which employ similar
> > SQL queries)
> >
> Must be what it is then. I've been thinking about switching to postgres,
> but haven't because the opensuse packages for bacula are only for mysql.
> This may motivate me more.

You should probably switch soon, before you get to like your
database,,, Exporting bacula mysql tables for import in PostgreSQL
can be very painful and problematic; it is much better to just drop
the database and create fresh one.

> The backup finished, so it seems that in version 3.0.3 bacula does NOT
> set the socket option SO_KEEPALIVE.

Hmm, yeah, I've check the code casually, and it indeed looks like the
heartbeats are not setting SO_KEEPALIVE timeouts (note that it does
set SO_KEEPALIVE on the socket, otherwise the advice above wouldn't
work -- it just doesn't do TCP_KEEPIDLE on that[1] to specify
user-defined timeouts and instead uses system defaults). 

The heartbeats look like are doing other things though (application-level, 
not socket-level), but as you saw they are not perfect for fixing network 
idleness problems - and so you also MUST set system defaults.

I've updated the FAQ at:
http://wiki.bacula.org/doku.php?id=faq#my_backup_starts_but_dies_after_a_while_with_connection_reset_by_peer_error


[1] It actually tries that at one point in src/lib/bsock.c if
    TCP_KEEPIDLE support is detected, but it fails to detect it
    properly because <netinet/tcp.h> is not included.

    However, even after fixing that (and missing semicolon in 
    'int opt = heart_beat' line), it still doesn't look like it sets
    TCP_KEEPIDLE correctly on FD->SD connection, so maybe this
    codepath is not used there. 

    Anyway I gave up debugging there and just set the system
    defaults. But I just though I'd mention that in case someone
    else wants to continue chasing the bug.

-- 
Matija Nalis
Odjel racunalno-informacijskih sustava i servisa                                
                                                      
Hrvatska akademska i istrazivacka mreza - CARNet 
Josipa Marohnica 5, 10000 Zagreb
tel. +385 1 6661 616, fax. +385 1 6661 766
www.CARNet.hr

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>