Bacula-users

[Bacula-users] feature request: improve status messages by differentiating excessive concurrent jobs from actual errors

2011-12-21 12:52:52
Subject: [Bacula-users] feature request: improve status messages by differentiating excessive concurrent jobs from actual errors
From: Mark Bergman <mark.bergman AT uphs.upenn DOT edu>
To: Kern Sibbald <kern AT sibbald DOT com>, bacula-devel <bacula-devel AT lists.sourceforge DOT net>, bacula-users <bacula-users AT lists.sourceforge DOT net>
Date: Wed, 21 Dec 2011 12:50:29 -0500

Item 1:   Improve status messages by differentiating excessive concurrent jobs 
from actual errors
  Origin: Mark Bergman <mark.bergman AT uphs.upenn DOT edu>
  Date:   Wed Dec 21 12:36:52 EST 2011
  Status:

  What:   
        Configuration limits on concurrent jobs can cause additional
        jobs to fail. This is normal and the result of a deliberate user
        choice, not an error. In this case, the job failure should not
        be reported as an error.

  Why:    
        At this point, the SD returns the message:

            Fatal error: Unable to authenticate with File daemon at 
"client:9102".
            Possible causes:
                Passwords or names not the same or
                Maximum Concurrent Jobs exceeded on the FD or
                FD networking messed up (restart daemon).

        That message produces a lot of anxiety and requires significant
        trouble-shooting (determining how many jobs were running at a
        given moment vs the concurrency limit--set in multiple places--is
        not trivial).

        I haven't looked at the source code, but I'm guessing that this 
        change would probably require that the SD be allowed to connect to the
        FD regardless of the number of maximum concurrent jobs, then the FD
        return a status that indicates that the number of concurrent jobs has
        been exceeded. This change would also allow the director to query the
        FD for the job status if the maximum number of jobs was already
        running (please see feature request "exempt administrative connections
        from concurrency limits" submitted 30 Nov 2011).

        This change would also allow the SD or director to distinguish the 
states of:
                [1] not being able to connect to the FD at all
                [2] being able to connect to the FD but not authenticating
                [3] connecting, authenticating, but exceeding the maximum 
concurrent job limit
        

----
Mark Bergman                              voice: 215-662-7310
mark.bergman AT uphs.upenn DOT edu                 fax: 215-614-0266
System Administrator     Section of Biomedical Image Analysis
Department of Radiology            University of Pennsylvania
      PGP Key: https://www.rad.upenn.edu/sbia/bergman 

----- Text below this line was added without my consent -----

------------------------------------------------------------------------------
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create 
new or port existing apps to sell to consumers worldwide. Explore the 
Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>
  • [Bacula-users] feature request: improve status messages by differentiating excessive concurrent jobs from actual errors, Mark Bergman <=