Bacula-users

Re: [Bacula-users] Mysterious Director console connection failures

2012-03-07 10:02:07
Subject: Re: [Bacula-users] Mysterious Director console connection failures
From: Phil Stracchino <alaric AT metrocast DOT net>
To: bacula-users AT lists.sourceforge DOT net, bacula-devel <bacula-devel AT lists.sourceforge DOT net>
Date: Wed, 07 Mar 2012 09:58:48 -0500
OK, this is getting more and more peculiar as I study it more.  Adding
bacula-devel list.

To briefly recap the initial statement of the problem, I've been
experiencing a problem in which, after a number of successful
connections, console->Director connection authentication begins
repeatedly failing.  Everything else seems to continue to work normally.
 The typical behavior is that after manually starting two or three jobs
using BAT, I can no longer connect to the Director either with BAT or
with bconsole, but everything else continues to function normally and
the scheduled jobs run normally.  After the pending manually-scheduled
jobs complete, I can connect again.



On the theory that network bandwidth may be somehow involved, I tried
scheduling several jobs 15 minutes ahead of time, to see if I could get
more jobs running if I scheduled them all before any started.

Starting at about 0915, schedule job 1 for 0925.  No problem.
Schedule Job 2 for 0925.  No problem.
Schedule job 3 for 0925.  No problem.
At about 0918, try to schedule job 4 for 0925.  None of the new jobs has
yet started.  No go; neither bat nor bconsole can connect.


This is what the trace logged as I tried to connect with bconsole:

babylon4-dir: bnet.c:708-0 who=client host=10.24.32.10 port=36131
babylon4-dir: job.c:1331-0 wstorage=babylon5-sd
babylon4-dir: job.c:1340-0 wstore=babylon5-sd where=Pool resource
babylon4-dir: job.c:1031-0 JobId=0 created
Job=-Console-.2012-03-07_09.19.16_37
babylon4-dir: cram-md5.c:72-0 send: auth cram-md5
<1723850907.1331129956@babylon4-dir> ssl=0
babylon4-dir: cram-md5.c:131-0 cram-get received: auth cram-md5
<85736557.1331129966@bat> ssl=0
babylon4-dir: cram-md5.c:150-0 sending resp to challenge:
25Q2B+IdJ/UKI/+p6++vkC
babylon4-dir: ua_dotcmds.c:164-0 Cmd: .api 1
babylon4-dir: ua_dotcmds.c:164-0 Cmd: .levels Backup
babylon4-dir: bnet.c:708-0 who=client host=10.24.32.10 port=36131
babylon4-dir: bnet.c:708-0 who=client host=10.24.32.14 port=36131


The console reported:

babylon4:root:/opt/bacula/etc:29 # bconsole
Connecting to Director babylon4:9101
Director authorization problem.
Most likely the passwords do not agree.
If you are using TLS, there may have been a certificate validation error
during the TLS handshake.


After restarting the Director, I re-enabled the trace (setdebug director
level=100 trace=1), then reconnected again with bconsole:

babylon4-dir: bnet.c:708-0 who=client host=10.24.32.14 port=36131
babylon4-dir: job.c:1331-0 wstorage=babylon5-sd
babylon4-dir: job.c:1340-0 wstore=babylon5-sd where=Pool resource
babylon4-dir: job.c:1031-0 JobId=0 created
Job=-Console-.2012-03-07_09.32.59_04
babylon4-dir: cram-md5.c:72-0 send: auth cram-md5
<1031666935.1331130779@babylon4-dir> ssl=0
babylon4-dir: cram-md5.c:131-0 cram-get received: auth cram-md5
<41725829.1331130779@bconsole> ssl=0
babylon4-dir: cram-md5.c:150-0 sending resp to challenge:
6Sgw8g8aLxgeAEx5CwsU1B

This looks no different to me than the failed connection attempt.  So I
tried starting up bconsole from the Linux machine I'm running bat on.
That worked fine, so I quit it and started another.  I did this about
five times.  Then I started six at once.  No problem.

It appears I can connect as many consoles as I want, up to the
Director's configured concurrency limit, with no problem ... until I
start scheduling jobs.

So, then I opened a bconsole and left it open, then scheduled two jobs
from BAT successfully.  Then I tried to schedule a third.  No go.

At this point, I tried to open an additional new bconsole.  No go, and
the trace *did not log anything* for the connection attempt.  I could
continue to schedule more manual jobs from the existing open bconsole,
but could start no new consoles, and BAT became completely unresponsive.
 It appears that once two or three jobs were scheduled, the Director
*stopped listening* for new console connections, but continued to
service existing open consoles.


All daemons are Bacula 5.2.5, all 64-bit builds.  The Director and the
disk-based SD are running on Solaris 10u9 amd64, built using Sun Studio
12.2.  The tape SD is running on Gentoo Linux amd64, built using
gcc-4.5.3.  BAT runs on the Linux box, and I used bconsoles from both
machines with no difference in behavior.



-- 
  Phil Stracchino, CDK#2     DoD#299792458     ICBM: 43.5607, -71.355
  alaric AT caerllewys DOT net   alaric AT metrocast DOT net   phil AT 
co.ordinate DOT org
  Renaissance Man, Unix ronin, Perl hacker, SQL wrangler, Free Stater
                 It's not the years, it's the mileage.

------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>