Bacula-users

Re: [Bacula-users] TLS negotiation handshake errors

2009-04-10 06:50:22
Subject: Re: [Bacula-users] TLS negotiation handshake errors
From: baculalist AT encambio DOT com
To: bacula-users AT lists.sourceforge DOT net
Date: Fri, 10 Apr 2009 12:45:25 +0200
Hello Ryan,

On jeu., avr  09, 2009, Ryan NOVOSIELSKI wrote:
>baculalist AT encambio DOT com wrote:
>> On mer., avr  08, 2009, Dan LANGILLE wrote:
>>> baculalist AT encambio DOT com wrote:
>>>>   Director hostname back1.host.com: Solaris x86 11 (nv-b91)
>>>>   File daemon hostname back1.host.com: Solaris x86 11 (nv-b91)
>>>>
>>>>   Errors seen on the director:
>>>>   08-Apr 09:36 bacsrv-dir JobId 40: Start Backup JobId 40, 
>>>> Job=Debut.2009-04-08_09.36.52.03
>>>>   08-Apr 09:36 bacsrv-dir JobId 40: Using Device "FileStorage"
>>>>   08-Apr 09:37 bacsrv-dir JobId 0: Error: openssl.c:86 Connect failure: 
>>>> ERR=error:1408F10B:SSL routines:SSL3_GET_RECORD:wrong version number
>>>>   08-Apr 09:37 bacsrv-dir JobId 40: Fatal error: TLS negotiation failed 
>>>> with FD at "back1.host.com:9102".
>>>>
>>>>
>>> I Googled. I found:
>>>
>>> http://www.mail-archive.com/bacula-users AT lists.sourceforge DOT 
>>> net/msg04842.html
>>>
>>> Does that help?
>>>
>> Very little. I've checked that my certs are correct (permissions,
>> CN=, etc.) In the bacula config files I've added hostnames (matching
>> CN=) with 'TLS Allowed CN' in every possible place (according to th
>> '-t' option to check config files.)
>> 
>>
>What documentation have you used to set up Bacula with TLS? I seem to
>recall, actually, that there was one source of documentation that
>mentioned one step that wasn't in another (I believe the best one was
>written by Landon Fuller -- I forget where I found it). Perhaps you
>might want to search the list archives for discussions I had on this
>subject maybe 6-9 months ago as I believe I was pointed in the right
>direction.
>
>
Good ideas, I did see some configuarion advice from Landon. His
http://www.bacula.org/en/rel-manual/Bacula_TLS_Communication.html
does help, as does http://www.devco.net/pubwiki/Bacula/TLS/ from
R.I.Pienaar.

I trussed(1) the bacula-fd process and debugged the code to find
that the SSL logic reads(2) from a blocked socket (the same one
which the director CRAM-MD5 authorized with.) Because lib/tls.c
had set this socket to be nonblocking, the read(2) returns with
the error EAGAIN (errno 11.) The method openssl_bsock_session_start
in lib/tls.c is where this all happens, and finally returns false
(Socket Error Occured.) That is why the connection is rejected.

The trace:

  $ truss /pfx/sbin/bacula-fd -f ...
  [...]
  /3:   read(6, "\0\0\0  ", 4)                          = 4
  /3:   read(6, " H e l l o   D i r e c t".., 32)       = 32
  /3:   read(6, " a u t h   c r a m - m d".., 52)       = 52
  /3:   read(6, " 1 0 0 0   O K   a u t h".., 13)       = 13
  [...]
  /3:   fcntl(6, F_GETFL)                               = 2
  /3:   fcntl(6, F_SETFL, FWRITE|FNONBLOCK)             = 0
  /3:   time()                                          = 1239322002
  /3:   time()                                          = 1239322002
  /3:   time()                                          = 1239322002
  /3:   brk(0x081EE990)                                 = 0
  /3:   brk(0x081F4990)                                 = 0
  /3:   brk(0x081F4990)                                 = 0
  /3:   brk(0x081F8990)                                 = 0
  /3:   brk(0x081F8990)                                 = 0
  /3:   brk(0x081FC990)                                 = 0
  /3:   brk(0x081FC990)                                 = 0
  /3:   brk(0x081FE990)                                 = 0
  /3:   read(6, 0x081F34A0, 5)                          Err#11 EAGAIN
  [...]

The code (in src/lib/tls.c):

static inline bool openssl_bsock_session_start(BSOCK *bsock, bool server)
{
   TLS_CONNECTION *tls = bsock->tls;

   [...]

   /* Ensure that socket is non-blocking */
   flags = bsock->set_nonblocking();

   [...]

   for (;;) { 
      if (server) {
         err = SSL_accept(tls->openssl);

   [...]

      /* Handle errors */
      switch (SSL_get_error(tls->openssl, err)) {
      case SSL_ERROR_NONE:
         stat = true;
         goto cleanup;
      [...]
      default:
         /* Socket Error Occured */
         openssl_post_errors(M_ERROR, _("Connect failure"));
         stat = false;
         goto cleanup;
      }

      [...]

cleanup:
   /* Restore saved flags */
   bsock->restore_blocking(flags);
   /* Clear timer */
   bsock->timer_start = 0;

   return stat;
}

If I remove the fnctl(2) where the socket is set to nonblocking,
things go further but in the end the client is unable to read
anything and the director reports 'Fatal error: FD gave bad response
to JobId command: No data available.'

Anybody familiar with the logic around openssl_bsock_session_start,
or have an idea of what might be going on? Is anybody besides me
using Solaris? Remember that Solaris has its own not the BSD
variant) socket API.

-- 
Eduard

------------------------------------------------------------------------------
This SF.net email is sponsored by:
High Quality Requirements in a Collaborative Environment.
Download a free trial of Rational Requirements Composer Now!
http://p.sf.net/sfu/www-ibm-com
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>