Bacula-users

Re: [Bacula-users] Network send error to SD. ERR=Connection reset by peer

2013-06-13 11:46:16
Subject: Re: [Bacula-users] Network send error to SD. ERR=Connection reset by peer
From: Martin Simmons <martin AT lispworks DOT com>
To: bacula-users AT lists.sourceforge DOT net
Date: Thu, 13 Jun 2013 16:42:43 +0100
>>>>> On Thu, 13 Jun 2013 08:54:43 -0400, Clark, Patricia A said:
> 
> On 6/12/13 10:41 AM, "Josh Fisher" <jfisher AT pvct DOT com> wrote:
> 
> 
> >
> >On 6/11/2013 11:10 AM, Leonardo - Mandic wrote:
> >> Hello,
> >>
> >> After upgrade to bacula 5.2.13 I have bacula storage problems. Appers
> >> a network problem, but don't are, I have a gigabit network dedicated
> >> to bacula. The problem is on backups running for many hours or days
> >> (full backup of 500gb delay 2 days, for example).
> >>
> >> The time is random, but 70% of servers have this same errors.
> >>
> >> On old versions never have this problem, and its same network and same
> >> servers of old bacula versions.
> >>
> >> Anybody have this problem on 5.2.13?
> >>
> >> Erroris:
> >>
> >>
> >> 2013-06-10 23:51:01 servert-fd JobId 266: Error: bsock.c:429 Write
> >> error sending 64562 bytes to Storage daemon:10.1.0.60:9103:
> >> ERR=Connection reset by peer
> >> 2013-06-10 23:51:01 servert-fd JobId 266: Fatal error: backup.c:1200
> >> Network send error to SD. ERR=Connection reset by peer
> >
> >In my experience, it has always been hardware related. In particular,
> >aggressive power saving modes will cause this when one of the systems
> >cuts power to its Ethernet PHY at an inappropriate time. This can be
> >because the device driver's default is geared toward early power savings
> >and the op hasn't changed it, or a buggy device driver shuts off the PHY
> >when it shouldn't. Bacula requires that TCP connections remain up
> >throughout the job lifetime. Anything that might cause a delay could
> >cause this if the power save timeout for the Ethernet controller is
> >shorter than the delay. For example, if the database server is restarted
> >by a nightly cron job and you are not spooling attributes, then the
> >delay could allow the device driver to shut down the PHY due to
> >"inactivity".
> >
> >
> >--------------------------------------------------------------------------
> >----
> 
> I would suggest that that is not the case for this issue.  I have had this
> on a server that is busy backing up multiple backups where one of them
> will get this error.  Everything is on the server, so I am not reaching
> out to a separate client.  I do not use any of the power saving features
> on the server either.

The FD error "Network send error to SD ERR=Connection reset by peer" means
that the FD unexpectedly lost contact (at the TCP level) with the SD while
writing data to it.

I think the only possible causes are:

1. The network broke between the FD and the SD.
2. The SD died.
3. The FD got a different error but reported it incorrectly (not very likely).

__Martin

------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users