Bacula-users

Re: [Bacula-users] REPOST Can't connect to Remote.

2011-01-07 14:22:05
Subject: Re: [Bacula-users] REPOST Can't connect to Remote.
From: Martin Simmons <martin AT lispworks DOT com>
To: Bacula-users AT lists.sourceforge DOT net
Date: Fri, 7 Jan 2011 19:19:08 GMT
OK, the output shows that bacula is interrupting the connect itself, as a way
to prevent it from hanging indefinitely.  It looks like the error could be
reported better, but the upshot is that connect is hanging unexpectedly.

Does the "Interrupted system call" happen after a delay of 3 minutes (or the
configured FD Connect Timeout)?

I suggest trying

telnet 192.68.0.30 9102

on kira to see if that can connect to emma reliably.

__Martin

p.s. I've replied to the list again.


>>>>> On Thu, 6 Jan 2011 21:32:41 -0500, Wayne Spivak said:
> 
> Enclosed is the output.  I don't see anything (but then again, I probably
> wouldn't).
> Thanks for the help.
> 
> Wayne
> 
> -----Original Message-----
> From: Martin Simmons [mailto:martin AT lispworks DOT com] 
> Sent: Thursday, January 06, 2011 8:28 AM
> To: Bacula-users AT lists.sourceforge DOT net
> Subject: Re: [Bacula-users] REPOST Can't connect to Remote.
> 
>>>>> On Thu, 6 Jan 2011 07:36:32 -0500, Wayne Spivak said:
> > 
> > Here the last few error messages:
> > 
> > EMMA:
> > 02-Jan 21:17 kira.sbanetweb.com-dir JobId 38: Fatal error: bsock.c:135
> > Unable to connect to Client: emma.sbanetweb.com-fd on 192.68.0.30:9102.
> > ERR=Interrupted system call
> > 03-Jan 23:28 kira.sbanetweb.com-dir JobId 48: Fatal error: No Job status
> > returned from FD.
> > 03-Jan 23:28 kira.sbanetweb.com-dir JobId 48: Fatal error: bsock.c:135
> > Unable to connect to Client: emma.sbanetweb.com-fd on 192.68.0.30:9102.
> > ERR=Interrupted system call
> 
> Interesting.  "Interrupted system call" is rather surprising, because it
> usually indicates a bug in the code that gets it.
> 
> I suggest using
> 
> strace -f -p $dirpid
> 
> as root on kira, where $dirpid is the pid of the bacula-dir.  Then run the
> emma job to see which system calls are being interrupted.
> 
> 
> > TUFFY: (outside firewall - Fedora 13 box)
> > 02-Jan 23:21 kira.sbanetweb.com-dir JobId 40: Fatal error: Socket error on
> > Storage command: ERR=Connection reset by peer
> > 02-Jan 23:21 kira.sbanetweb.com-dir JobId 40: Fatal error: Network error
> > with FD during Backup: ERR=Connection reset by peer
> > 
> > 03-Jan 23:15 kira.sbanetweb.com-dir JobId 46: Fatal error: Socket error on
> > Storage command: ERR=Connection reset by peer
> > 03-Jan 23:15 kira.sbanetweb.com-dir JobId 46: Fatal error: Network error
> > with FD during Backup: ERR=Connection reset by peer
> > 
> > Ladymax:
> > 
> > 02-Jan 23:31 kira.sbanetweb.com-dir JobId 41: Fatal error: Socket error on
> > Storage command: ERR=Connection reset by peer
> > 02-Jan 23:31 kira.sbanetweb.com-dir JobId 41: Fatal error: Network error
> > with FD during Backup: ERR=Connection reset by peer
> > 
> > 03-Jan 23:25 kira.sbanetweb.com-dir JobId 47: Fatal error: Socket error on
> > Storage command: ERR=Connection reset by peer
> > 03-Jan 23:25 kira.sbanetweb.com-dir JobId 47: Fatal error: Network error
> > with FD during Backup: ERR=Connection reset by peer
> 
> Usually "Connection reset by peer" means that the other end closed the
> connection.  Running the bacula-fd with -d400 might give some idea why.
> 
> __Martin
> 
> 
> > 
> > -----Original Message-----
> > From: Martin Simmons [mailto:martin AT lispworks DOT com] 
> > Sent: Thursday, January 06, 2011 6:32 AM
> > To: Bacula-users AT lists.sourceforge DOT net
> > Subject: Re: [Bacula-users] REPOST Can't connect to Remote.
> > 
>>>>> On Wed, 5 Jan 2011 20:51:18 -0500, Wayne Spivak said:
> > > 
> > >  Installed Bacula 5.0.2 on Fedora 14 (called Kira).
> > > 
> > >  Previously had it installed and working on Fedora 11 (called Beech)
> > > 
> > >  I copied all the conf files from Beech to Kira (adjusted them for new
> > >  Machine names), debugged normal errors and Bacula started.
> > > 
> > > Did a backup on Kira without problems.
> > > 
> > >  Went to test on Ladymax (on other side of firewall - public machine):
> > >  Port 9102 works both ways (only running bacula-fd) Port 9101 and 9103 
> > >  work from Ladymax to Kira Both using 5.0.2 (FD for Ladymax) 
> > > 
> > > Started Ladymax in Debug mode:
> > >  /sbin/bacula-fd -c/etc/bacula/bacula-fd.conf -f -d20 -m -v -s -dt 
> > >  29-Dec-2010 09:20:44 ladymax.sbanetweb.com-fd: filed.c:275-0 filed:
> > >  listening on port 9102
> > > 
> > >  Bacula on Kira won't find Ladymax.  Error is " kira.sbanetweb.com-dir 
> > >  JobId  14: Fatal error: Socket error on Storage command: ERR=Connection
> > > reset 
> > >  by peer 29-Dec 10:19 kira.sbanetweb.com-dir JobId 14: Fatal error: 
> > >  Network error with FD during Backup: ERR=Connection reset by peer
> 29-Dec
> > > 10:19"
> > > 
> > >  Remember, Ladymax works under a Fedora 11 install... I even turned off 
> > >  iptables on Kira (inside of firewall), to no avail.  
> > > 
> > > 
> > > I then loaded Bacula client (5.0.2) on a differnt Fedora 14 box which is
> > > behind the
> > > firewall (EMMA) and is 1 IP address different from KIRA.  I took down
> > > iptables (since it is redundant and to minimize possible errors).
> > > 
> > > Same basic error:
> > > "Fatal error: bsock.c:135 Unable to connect to Client:
> > emma.sbanetweb.com-fd
> > > on 192.68.0.30:9102. ERR=Interrupted system call"
> > 
> > This is not actually the same basic error: it is a complete failure to
> > connect, whereas the error from Ladymax occurs after connection.
> > 
> > Is the error always "Unable to connect to Client...Interrupted system
> call"
> > for emma and always "Socket error on Storage command: ERR=Connection reset
> > by
> > peer" for Ladymax or is it somewhat random?
> > 
> > __Martin
> > 
> >
> ----------------------------------------------------------------------------
> > --
> > Learn how Oracle Real Application Clusters (RAC) One Node allows customers
> > to consolidate database storage, standardize their database environment,
> > and, 
> > should the need arise, upgrade to a full multi-node Oracle RAC database 
> > without downtime or disruption
> > http://p.sf.net/sfu/oracle-sfdevnl
> > _______________________________________________
> > Bacula-users mailing list
> > Bacula-users AT lists.sourceforge DOT net
> > https://lists.sourceforge.net/lists/listinfo/bacula-users
> > 
> 
> ----------------------------------------------------------------------------
> --
> Learn how Oracle Real Application Clusters (RAC) One Node allows customers
> to consolidate database storage, standardize their database environment,
> and, 
> should the need arise, upgrade to a full multi-node Oracle RAC database 
> without downtime or disruption
> http://p.sf.net/sfu/oracle-sfdevnl
> _______________________________________________
> Bacula-users mailing list
> Bacula-users AT lists.sourceforge DOT net
> https://lists.sourceforge.net/lists/listinfo/bacula-users
> 

------------------------------------------------------------------------------
Gaining the trust of online customers is vital for the success of any company
that requires sensitive data to be transmitted over the Web.   Learn how to 
best implement a security strategy that keeps consumers' information secure 
and instills the confidence they need to proceed with transactions.
http://p.sf.net/sfu/oracle-sfdevnl 
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users