Bacula-users

Re: [Bacula-users] Device is BLOCKED - renamed Bugged or not?

2015-04-24 16:11:12
Subject: Re: [Bacula-users] Device is BLOCKED - renamed Bugged or not?
From: Josh Fisher <jfisher AT pvct DOT com>
To: "Clark, Patricia A." <clarkpa AT ornl DOT gov>, "bacula-users AT lists.sourceforge DOT net" <bacula-users AT lists.sourceforge DOT net>
Date: Fri, 24 Apr 2015 16:07:28 -0400
I guess it is semantics, but I was just pointing out that it was not a 
coding issue, but rather a design issue/choice.

You can divide the jobs into different pools and then give jobs in the 
same pools different priorities. The pools allow multiple jobs (from 
different pools) to run concurrently, while the priorities serialize the 
jobs within each pool. Far from desirable, but it does work.

In any case, I agree that all of the ways of using multiple drives 
concurrently seem unwieldy.  It would be nice if both device and volume 
assignment were done as a single atomic operation every time that a job 
selected a volume. In other words, when the job needs a volume, it looks 
for both an AVAILABLE volume and an AVAILABLE device at the same time, 
and only one job at a time can make a volume-device selection. That is 
easier said than done, of course.

On 4/24/2015 1:09 PM, Clark, Patricia A. wrote:
> To avoid hijacking the question and to address whether it's a bug or not:
>
> Why it's a bug - request for media that is unavailable because it is
> already in use whether for a backup or recovery by a new backup job is a
> bug when other perfectly good media is available.  One should not need to
> create separate pools otherwise you will need a separate pool for each job
> to ensure this situation never happens.  The real issue here is how and
> when the communication happens between the director and the storage
> daemon.  If both of these jobs start within a short period of each other
> (usually on the same schedule), that's when the second job will request
> media that has already been assigned by the SD, but not communicated to
> the director prior to the second job starting.  That gap is what creates
> the contention for media.  I have also had tapes pulled out from
> underneath a job resulting in "NULL" volume name and failed jobs.  So, if
> not separate pools, then there's using separate schedules for each job,
> also not desireable.  I have used offset schedules for groups of jobs in
> order to reduce the number of contentions.  If nothing else, if media is
> not available within a reasonable period of time of the request, the
> director and/or the SD should decide to look for another.
>
> Patti Clark
> Linux System Administrator
> R&D Systems Support Oak Ridge National Laboratory
>
>
>
> On 4/24/15, 11:02 AM, "Josh Fisher" <jfisher AT pvct DOT com> wrote:
>
>> On 4/24/2015 9:14 AM, Clark, Patricia A. wrote:
>>> This is a known bug that has been reported, but still exists.  The job
>>> wants the tape in use by another job that is using it in drive 0.
>> I'm not convinced that this is a bug. By design, Bacula allows more than
>> one job to simultaneously write to the same volume. When a job looks for
>> the next volume to write on, it cannot exclude volumes that are already
>> in use by another job. Note that this is not just at job start up, but
>> any time a volume is needed. What causes the catch-22 is that each job
>> is assigned a single device (tape drive) only once at job start up. If
>> two jobs, each writing to a different device, require the same volume,
>> then one job must wait until the volume can be moved into its assigned
>> device. So it is not a bug in the implementation, but rather a design
>> choice.
>>
>>  From the perspective of using a multiple drive changer it would seem
>> that it is a bug to allow multiple jobs to simultaneously write to the
>> same volume, but Bacula must work with all kinds of hardware. If the
>> implementation were changed to disallow simultaneous writes to the same
>> volume, then concurrent jobs with a single drive changer would be
>> impossible.
>>
>> Bacula does allow resolving this issue through the use of pools. By
>> segregating jobs that are to be run concurrently into different pools,
>> the situation where two jobs want the same volume at the same time is
>> avoided altogether.  So is this a bug, or is it a configuration error?
>>
>>


------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users