Bacula-users

Re: [Bacula-users] REPOST Can't connect to Remote.

2011-01-06 08:30:33
Subject: Re: [Bacula-users] REPOST Can't connect to Remote.
From: Martin Simmons <martin AT lispworks DOT com>
To: Bacula-users AT lists.sourceforge DOT net
Date: Thu, 6 Jan 2011 13:28:11 GMT
>>>>> On Thu, 6 Jan 2011 07:36:32 -0500, Wayne Spivak said:
> 
> Here the last few error messages:
> 
> EMMA:
> 02-Jan 21:17 kira.sbanetweb.com-dir JobId 38: Fatal error: bsock.c:135
> Unable to connect to Client: emma.sbanetweb.com-fd on 192.68.0.30:9102.
> ERR=Interrupted system call
> 03-Jan 23:28 kira.sbanetweb.com-dir JobId 48: Fatal error: No Job status
> returned from FD.
> 03-Jan 23:28 kira.sbanetweb.com-dir JobId 48: Fatal error: bsock.c:135
> Unable to connect to Client: emma.sbanetweb.com-fd on 192.68.0.30:9102.
> ERR=Interrupted system call

Interesting.  "Interrupted system call" is rather surprising, because it
usually indicates a bug in the code that gets it.

I suggest using

strace -f -p $dirpid

as root on kira, where $dirpid is the pid of the bacula-dir.  Then run the
emma job to see which system calls are being interrupted.


> TUFFY: (outside firewall - Fedora 13 box)
> 02-Jan 23:21 kira.sbanetweb.com-dir JobId 40: Fatal error: Socket error on
> Storage command: ERR=Connection reset by peer
> 02-Jan 23:21 kira.sbanetweb.com-dir JobId 40: Fatal error: Network error
> with FD during Backup: ERR=Connection reset by peer
> 
> 03-Jan 23:15 kira.sbanetweb.com-dir JobId 46: Fatal error: Socket error on
> Storage command: ERR=Connection reset by peer
> 03-Jan 23:15 kira.sbanetweb.com-dir JobId 46: Fatal error: Network error
> with FD during Backup: ERR=Connection reset by peer
> 
> Ladymax:
> 
> 02-Jan 23:31 kira.sbanetweb.com-dir JobId 41: Fatal error: Socket error on
> Storage command: ERR=Connection reset by peer
> 02-Jan 23:31 kira.sbanetweb.com-dir JobId 41: Fatal error: Network error
> with FD during Backup: ERR=Connection reset by peer
> 
> 03-Jan 23:25 kira.sbanetweb.com-dir JobId 47: Fatal error: Socket error on
> Storage command: ERR=Connection reset by peer
> 03-Jan 23:25 kira.sbanetweb.com-dir JobId 47: Fatal error: Network error
> with FD during Backup: ERR=Connection reset by peer

Usually "Connection reset by peer" means that the other end closed the
connection.  Running the bacula-fd with -d400 might give some idea why.

__Martin


> 
> -----Original Message-----
> From: Martin Simmons [mailto:martin AT lispworks DOT com] 
> Sent: Thursday, January 06, 2011 6:32 AM
> To: Bacula-users AT lists.sourceforge DOT net
> Subject: Re: [Bacula-users] REPOST Can't connect to Remote.
> 
>>>>> On Wed, 5 Jan 2011 20:51:18 -0500, Wayne Spivak said:
> > 
> >  Installed Bacula 5.0.2 on Fedora 14 (called Kira).
> > 
> >  Previously had it installed and working on Fedora 11 (called Beech)
> > 
> >  I copied all the conf files from Beech to Kira (adjusted them for new
> >  Machine names), debugged normal errors and Bacula started.
> > 
> > Did a backup on Kira without problems.
> > 
> >  Went to test on Ladymax (on other side of firewall - public machine):
> >  Port 9102 works both ways (only running bacula-fd) Port 9101 and 9103 
> >  work from Ladymax to Kira Both using 5.0.2 (FD for Ladymax) 
> > 
> > Started Ladymax in Debug mode:
> >  /sbin/bacula-fd -c/etc/bacula/bacula-fd.conf -f -d20 -m -v -s -dt 
> >  29-Dec-2010 09:20:44 ladymax.sbanetweb.com-fd: filed.c:275-0 filed:
> >  listening on port 9102
> > 
> >  Bacula on Kira won't find Ladymax.  Error is " kira.sbanetweb.com-dir 
> >  JobId  14: Fatal error: Socket error on Storage command: ERR=Connection
> > reset 
> >  by peer 29-Dec 10:19 kira.sbanetweb.com-dir JobId 14: Fatal error: 
> >  Network error with FD during Backup: ERR=Connection reset by peer 29-Dec
> > 10:19"
> > 
> >  Remember, Ladymax works under a Fedora 11 install... I even turned off 
> >  iptables on Kira (inside of firewall), to no avail.  
> > 
> > 
> > I then loaded Bacula client (5.0.2) on a differnt Fedora 14 box which is
> > behind the
> > firewall (EMMA) and is 1 IP address different from KIRA.  I took down
> > iptables (since it is redundant and to minimize possible errors).
> > 
> > Same basic error:
> > "Fatal error: bsock.c:135 Unable to connect to Client:
> emma.sbanetweb.com-fd
> > on 192.68.0.30:9102. ERR=Interrupted system call"
> 
> This is not actually the same basic error: it is a complete failure to
> connect, whereas the error from Ladymax occurs after connection.
> 
> Is the error always "Unable to connect to Client...Interrupted system call"
> for emma and always "Socket error on Storage command: ERR=Connection reset
> by
> peer" for Ladymax or is it somewhat random?
> 
> __Martin
> 
> ----------------------------------------------------------------------------
> --
> Learn how Oracle Real Application Clusters (RAC) One Node allows customers
> to consolidate database storage, standardize their database environment,
> and, 
> should the need arise, upgrade to a full multi-node Oracle RAC database 
> without downtime or disruption
> http://p.sf.net/sfu/oracle-sfdevnl
> _______________________________________________
> Bacula-users mailing list
> Bacula-users AT lists.sourceforge DOT net
> https://lists.sourceforge.net/lists/listinfo/bacula-users
> 

------------------------------------------------------------------------------
Learn how Oracle Real Application Clusters (RAC) One Node allows customers
to consolidate database storage, standardize their database environment, and, 
should the need arise, upgrade to a full multi-node Oracle RAC database 
without downtime or disruption
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users