Bacula-users

Re: [Bacula-users] [Bacula-devel] Mysterious Director console connection failures

2012-03-07 14:05:03
Subject: Re: [Bacula-users] [Bacula-devel] Mysterious Director console connection failures
From: Martin Simmons <martin AT lispworks DOT com>
To: Kern Sibbald <kern AT sibbald DOT com>
Date: Wed, 7 Mar 2012 19:02:32 GMT
>>>>> On Wed, 07 Mar 2012 19:13:58 +0100, Kern Sibbald said:
> 
> On 03/07/2012 07:04 PM, Martin Simmons wrote:
> > I don't think Maximum Concurrent Jobs ever controled console connections --
> > the default is 1, so its hard to see how that would work.
> 
> Look at the old code.  There was a kludge to let Consoles through.

It doesn't work like that now.  Even back to 1.38.11,
director->MaxConcurrentJobs only controls parallelism in the job queue
workers, which is a different pool of threads from console connections.
Maximum Console Connections was hardwired as 10 at that time.

__Martin


> 
> Kern
> 
> >
> > __Martin
> >
> >
> >>>>>> On Wed, 07 Mar 2012 17:29:07 +0100, Kern Sibbald said:
> >> Phil,
> >>
> >> You might take a look at what you set Maximum Concurrent Jobs
> >> to in the Director.  Each console has a job associated with it (jobid=0)
> >> and so if you reach the maximum, no more will start.
> >>
> >> Someone (not me) added Maximum Console Connections, which defaults
> >> to 20, but I am not 100% sure how it interacts with Maximum Concurrent
> >> Jobs.  Before Maximum Console Connections, everything was lumped int
> >> Maximum Concurrent Jobs.
> >>
> >> Regards,
> >> Kern
> >>
> >> On 03/07/2012 03:58 PM, Phil Stracchino wrote:
> >>> OK, this is getting more and more peculiar as I study it more.  Adding
> >>> bacula-devel list.
> >>>
> >>> To briefly recap the initial statement of the problem, I've been
> >>> experiencing a problem in which, after a number of successful
> >>> connections, console->Director connection authentication begins
> >>> repeatedly failing.  Everything else seems to continue to work normally.
> >>>    The typical behavior is that after manually starting two or three jobs
> >>> using BAT, I can no longer connect to the Director either with BAT or
> >>> with bconsole, but everything else continues to function normally and
> >>> the scheduled jobs run normally.  After the pending manually-scheduled
> >>> jobs complete, I can connect again.
> >>>
> >>>
> >>>
> >>> On the theory that network bandwidth may be somehow involved, I tried
> >>> scheduling several jobs 15 minutes ahead of time, to see if I could get
> >>> more jobs running if I scheduled them all before any started.
> >>>
> >>> Starting at about 0915, schedule job 1 for 0925.  No problem.
> >>> Schedule Job 2 for 0925.  No problem.
> >>> Schedule job 3 for 0925.  No problem.
> >>> At about 0918, try to schedule job 4 for 0925.  None of the new jobs has
> >>> yet started.  No go; neither bat nor bconsole can connect.
> >>>
> >>>
> >>> This is what the trace logged as I tried to connect with bconsole:
> >>>
> >>> babylon4-dir: bnet.c:708-0 who=client host=10.24.32.10 port=36131
> >>> babylon4-dir: job.c:1331-0 wstorage=babylon5-sd
> >>> babylon4-dir: job.c:1340-0 wstore=babylon5-sd where=Pool resource
> >>> babylon4-dir: job.c:1031-0 JobId=0 created
> >>> Job=-Console-.2012-03-07_09.19.16_37
> >>> babylon4-dir: cram-md5.c:72-0 send: auth cram-md5
> >>> <1723850907.1331129956@babylon4-dir>   ssl=0
> >>> babylon4-dir: cram-md5.c:131-0 cram-get received: auth cram-md5
> >>> <85736557.1331129966@bat>   ssl=0
> >>> babylon4-dir: cram-md5.c:150-0 sending resp to challenge:
> >>> 25Q2B+IdJ/UKI/+p6++vkC
> >>> babylon4-dir: ua_dotcmds.c:164-0 Cmd: .api 1
> >>> babylon4-dir: ua_dotcmds.c:164-0 Cmd: .levels Backup
> >>> babylon4-dir: bnet.c:708-0 who=client host=10.24.32.10 port=36131
> >>> babylon4-dir: bnet.c:708-0 who=client host=10.24.32.14 port=36131
> >>>
> >>>
> >>> The console reported:
> >>>
> >>> babylon4:root:/opt/bacula/etc:29 # bconsole
> >>> Connecting to Director babylon4:9101
> >>> Director authorization problem.
> >>> Most likely the passwords do not agree.
> >>> If you are using TLS, there may have been a certificate validation error
> >>> during the TLS handshake.
> >>>
> >>>
> >>> After restarting the Director, I re-enabled the trace (setdebug director
> >>> level=100 trace=1), then reconnected again with bconsole:
> >>>
> >>> babylon4-dir: bnet.c:708-0 who=client host=10.24.32.14 port=36131
> >>> babylon4-dir: job.c:1331-0 wstorage=babylon5-sd
> >>> babylon4-dir: job.c:1340-0 wstore=babylon5-sd where=Pool resource
> >>> babylon4-dir: job.c:1031-0 JobId=0 created
> >>> Job=-Console-.2012-03-07_09.32.59_04
> >>> babylon4-dir: cram-md5.c:72-0 send: auth cram-md5
> >>> <1031666935.1331130779@babylon4-dir>   ssl=0
> >>> babylon4-dir: cram-md5.c:131-0 cram-get received: auth cram-md5
> >>> <41725829.1331130779@bconsole>   ssl=0
> >>> babylon4-dir: cram-md5.c:150-0 sending resp to challenge:
> >>> 6Sgw8g8aLxgeAEx5CwsU1B
> >>>
> >>> This looks no different to me than the failed connection attempt.  So I
> >>> tried starting up bconsole from the Linux machine I'm running bat on.
> >>> That worked fine, so I quit it and started another.  I did this about
> >>> five times.  Then I started six at once.  No problem.
> >>>
> >>> It appears I can connect as many consoles as I want, up to the
> >>> Director's configured concurrency limit, with no problem ... until I
> >>> start scheduling jobs.
> >>>
> >>> So, then I opened a bconsole and left it open, then scheduled two jobs
> >>> from BAT successfully.  Then I tried to schedule a third.  No go.
> >>>
> >>> At this point, I tried to open an additional new bconsole.  No go, and
> >>> the trace *did not log anything* for the connection attempt.  I could
> >>> continue to schedule more manual jobs from the existing open bconsole,
> >>> but could start no new consoles, and BAT became completely unresponsive.
> >>>    It appears that once two or three jobs were scheduled, the Director
> >>> *stopped listening* for new console connections, but continued to
> >>> service existing open consoles.
> >>>
> >>>
> >>> All daemons are Bacula 5.2.5, all 64-bit builds.  The Director and the
> >>> disk-based SD are running on Solaris 10u9 amd64, built using Sun Studio
> >>> 12.2.  The tape SD is running on Gentoo Linux amd64, built using
> >>> gcc-4.5.3.  BAT runs on the Linux box, and I used bconsoles from both
> >>> machines with no difference in behavior.
> >>>
> >>>
> >>>
> >>
> >> ------------------------------------------------------------------------------
> >> Virtualization&  Cloud Management Using Capacity Planning
> >> Cloud computing makes use of virtualization - but cloud computing
> >> also focuses on allowing computing to be delivered as a service.
> >> http://www.accelacomm.com/jaw/sfnl/114/51521223/
> >> _______________________________________________
> >> Bacula-devel mailing list
> >> Bacula-devel AT lists.sourceforge DOT net
> >> https://lists.sourceforge.net/lists/listinfo/bacula-devel
> >>
> 

------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>