Bacula-users

Re: [Bacula-users] Network error with FD during Backup: ERR=Connection reset by peer

2012-10-03 13:54:07
Subject: Re: [Bacula-users] Network error with FD during Backup: ERR=Connection reset by peer
From: Thomas Lohman <thomasl AT mtl.mit DOT edu>
Date: Wed, 03 Oct 2012 13:50:13 -0400
> I now could check if bacula fd to sd connection timed out because of
> the network switches. This was not the case. My job still cancels.

My experience is that the heartbeat setting has not helped us with our 
"Connection Reset by Peer" issues that occur occasionally.  Something 
more is going on than a typical network timeout.

> Can someone tell me how and when the heartbeat should occur? Is it
> active when no job is running? In my config I set the following line
> for dir, sd and fd: Heartbeat Interval = 5 This should result in a
> heartbeat every 5 sec?

The heartbeats are only setup when a job with a client is initiated. 
So, there should be no activity when no job is running.  When you 
initiate a job with the client, the director sets up a connection with 
the client telling the client what storage daemon to use.  The client 
then initiates a connection back to that storage daemon.  If you have 
the heartbeat settings in place as you do then you should see heartbeat 
packets sent from the client back to the director in order to keep that 
connection alive while the data is being sent back to the storage 
daemon.  In addition, you may see heartbeat packets send from the 
storage daemon to the client.  I'd have to re-look at the code but I 
believe this is used in the scenario where the storage daemon is waiting 
for a volume to write the data to (i.e. operator intervention).  If the 
heartbeat setting is on then the storage daemon will send heartbeats 
back to the client in order to keep the connection alive while it waits.

Also of note, 5 seconds is the minimum feasible setting you can have. 
The heartbeat thread "wakes up" every 5 seconds to check to see if it 
needs to send a heartbeat to the director.  So, anything less than that 
really isn't going to do anything.

hope this helps,


--tom

------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users