Bacula-users

Re: [Bacula-users] Device is BLOCKED - renamed Bugged or not?

2015-04-25 01:45:05
Subject: Re: [Bacula-users] Device is BLOCKED - renamed Bugged or not?
From: Kern Sibbald <kern AT sibbald DOT com>
To: "Clark, Patricia A." <clarkpa AT ornl DOT gov>, Josh Fisher <jfisher AT pvct DOT com>, "bacula-users AT lists.sourceforge DOT net" <bacula-users AT lists.sourceforge DOT net>
Date: Sat, 25 Apr 2015 07:43:07 +0200
Your analysis(es) of the situation sounds correct to me.  The big
problems for developers are: 

1. Race conditions such as you mention are difficult to reproduce.  If
we have a script that will reproduce it every time or nearly every time,
it is relatively easy though sometimes a lot of work to fix the problem.

2. Yes, as you point out the Dir and the SD should look for another
drive/volume.  However, again, this needs a script to duplicate the
problem, and in addition this is more a development issue than a bug
(though that could be disputed), and thus it is a question of priorities
and finding someone with the desire and time to program.

One of the good things about Bacula Systems, is that paying customers
report many if not most of these problems, and in those cases, either
the customer is willing to produce a script that reproduces the problem
or the Bacula Systems support team does so, so these problems are being
fixed over time.  In each Bacula community release (the next in
June-July) *all* the Bacula Enterprise bug/race condition fixes are
backported to the community as well as many of the new Enterprise
features.  So the situation is not as bad as it may at first appear (at
least in my opinion).

Best regards,
Kern

On 24.04.2015 19:09, Clark, Patricia A. wrote:
> To avoid hijacking the question and to address whether it's a bug or not:
>
> Why it's a bug - request for media that is unavailable because it is
> already in use whether for a backup or recovery by a new backup job is a
> bug when other perfectly good media is available.  One should not need to
> create separate pools otherwise you will need a separate pool for each job
> to ensure this situation never happens.  The real issue here is how and
> when the communication happens between the director and the storage
> daemon.  If both of these jobs start within a short period of each other
> (usually on the same schedule), that's when the second job will request
> media that has already been assigned by the SD, but not communicated to
> the director prior to the second job starting.  That gap is what creates
> the contention for media.  I have also had tapes pulled out from
> underneath a job resulting in "NULL" volume name and failed jobs.  So, if
> not separate pools, then there's using separate schedules for each job,
> also not desireable.  I have used offset schedules for groups of jobs in
> order to reduce the number of contentions.  If nothing else, if media is
> not available within a reasonable period of time of the request, the
> director and/or the SD should decide to look for another.
>
> Patti Clark
> Linux System Administrator
> R&D Systems Support Oak Ridge National Laboratory
>
>
>
> On 4/24/15, 11:02 AM, "Josh Fisher" <jfisher AT pvct DOT com> wrote:
>
>> On 4/24/2015 9:14 AM, Clark, Patricia A. wrote:
>>> This is a known bug that has been reported, but still exists.  The job
>>> wants the tape in use by another job that is using it in drive 0.
>> I'm not convinced that this is a bug. By design, Bacula allows more than
>> one job to simultaneously write to the same volume. When a job looks for
>> the next volume to write on, it cannot exclude volumes that are already
>> in use by another job. Note that this is not just at job start up, but
>> any time a volume is needed. What causes the catch-22 is that each job
>> is assigned a single device (tape drive) only once at job start up. If
>> two jobs, each writing to a different device, require the same volume,
>> then one job must wait until the volume can be moved into its assigned
>> device. So it is not a bug in the implementation, but rather a design
>> choice.
>>
>> From the perspective of using a multiple drive changer it would seem
>> that it is a bug to allow multiple jobs to simultaneously write to the
>> same volume, but Bacula must work with all kinds of hardware. If the
>> implementation were changed to disallow simultaneous writes to the same
>> volume, then concurrent jobs with a single drive changer would be
>> impossible.
>>
>> Bacula does allow resolving this issue through the use of pools. By
>> segregating jobs that are to be run concurrently into different pools,
>> the situation where two jobs want the same volume at the same time is
>> avoided altogether.  So is this a bug, or is it a configuration error?
>>
>>
>>>    Your options are:
>>>
>>>    1.  Let it wait until the job(s) using the tape in drive 0 finishes.
>>> The pitfall here is if the tape becomes full.
>>>    2.  Cancel the job(s) requesting the tape in drive 1.  Don't restart
>>> the job, but start a new job.  It may or may not decide to use a
>>> different tape.
>>>    3.  Cancel the job(s) using the tape in drive 0.  Bacula should move
>>> the tape from drive 0 to drive 1 once all of the connections to the tape
>>> and drive have been released.
>>>    4.  If, for some strange reason there are no jobs using the tape in
>>> drive 0, try releasing drive 0 in bconsole - this will put the tape back
>>> into its slot and Bacula should mount it for you.
>>>
>>> You may need to use a combination of #4 and one of the other options.
>>> If none of the above corrects the issue, you may need to restart both
>>> your director and storage daemons and start again.
>>>
>>> Patti Clark
>>> Linux System Administrator
>>> R&D Systems Support Oak Ridge National Laboratory
>>>
>>> From: <More>, Ankush
>>> <ankush.more AT capgemini DOT com<mailto:ankush.more AT capgemini DOT com>>
>>> Date: Friday, April 24, 2015 at 3:29 AM
>>> To: Radosław Korzeniewski
>>> <radoslaw AT korzeniewski DOT net<mailto:radoslaw AT korzeniewski DOT net>>
>>> Cc: bacula-users
>>> <bacula-users AT lists.sourceforge DOT 
>>> net<mailto:[email protected]
>>> .net>>
>>> Subject: Re: [Bacula-users] Device is BLOCKED
>>>
>>> Hi ,
>>>
>>> Yes, I tried to mount from bconsole-->mount, but still  error is same.
>>> Appreciate if some can quickly help to resolve this issue.
>>>
>>> Device "Drive-1" (/dev/nst1) open but no Bacula volume is currently
>>> mounted.
>>>      Device is BLOCKED waiting for mount of volume "NY5039L4",
>>>         Pool:        Billable
>>>         Media type:  LTO-4
>>>      Slot 1 is loaded in drive 0.
>>>      Total Bytes Read=0 Blocks Read=0 Bytes/block=0
>>>      Positioned at File=0 Block=0
>>>
>>> Thank you,
>>> Ankush
>>> From: Radosław Korzeniewski [mailto:radoslaw AT korzeniewski DOT net]
>>> Sent: 23 April 2015 20:07
>>> To: More, Ankush
>>> Cc: 
>>> bacula-users AT lists.sourceforge DOT 
>>> net<mailto:[email protected].
>>> net>
>>> Subject: Re: [Bacula-users] Device is BLOCKED
>>>
>>> Hello,
>>>
>>> 2015-04-23 13:28 GMT+02:00 More, Ankush
>>> <ankush.more AT capgemini DOT com<mailto:ankush.more AT capgemini DOT com>>:
>>> Hi Team,
>>>
>>> We have bacula 7.x with  tape auto-changer.
>>> I am getting below error in "status" and backup stop ( list jobs show
>>> as running).
>>> I notice when I run   "/usr/libexec/bacula/mtx-changer"  tape
>>> "NY5039L4" is mounted in Drive.
>>>
>>>  From Bacula point of view mtx-changer can show you that a tape is
>>> loaded, not mounted.
>>>
>>> Then why bacula show BLOCKED.
>>> How to resolve this issue?
>>>
>>> Bacula is asking you to mount a tape. Did you do this? You can mount a
>>> tape with mount command in bconsole.
>>>
>>> Is there any parameter ?
>>>
>>> Device "Drive-1" (/dev/nst1) is waiting for:
>>>      Volume:      NY5216L4
>>>      Pool:        Billable
>>>      Media type:  LTO-4
>>>      Device is BLOCKED waiting for mount of volume "NY5039L4",
>>>         Pool:        Billable
>>>         Media type:  LTO-4
>>>      Slot 1 is loaded in drive 1.
>>>      Total Bytes Read=64,512 Blocks Read=1 Bytes/block=64,512
>>>      Positioned at File=0 Block=0
>>>
>>> Thank you,
>>> Ankush
>>> This message contains information that may be privileged or
>>> confidential and is the property of the Capgemini Group. It is intended
>>> only for the person to whom it is addressed. If you are not the intended
>>> recipient, you are not authorized to read, print, retain, copy,
>>> disseminate, distribute, or use this message or any part thereof. If you
>>> receive this message in error, please notify the sender immediately and
>>> delete all copies of this message.
>>>
>>>
>>> -------------------------------------------------------------------------
>>> -----
>>> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
>>> Develop your own process in accordance with the BPMN 2 standard
>>> Learn Process modeling best practices with Bonita BPM through live
>>> exercises
>>> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
>>> event?utm_
>>> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
>>> _______________________________________________
>>> Bacula-users mailing list
>>>
>>> Bacula-users AT lists.sourceforge DOT 
>>> net<mailto:[email protected].
>>> net>
>>> https://lists.sourceforge.net/lists/listinfo/bacula-users
>>>
>>>
>>>
>>> --
>>> Radosław Korzeniewski
>>> radoslaw AT korzeniewski DOT net<mailto:radoslaw AT korzeniewski DOT net>
>>>
>>> -------------------------------------------------------------------------
>>> -----
>>> One dashboard for servers and applications across Physical-Virtual-Cloud
>>> Widest out-of-the-box monitoring support with 50+ applications
>>> Performance metrics, stats and reports that give you Actionable Insights
>>> Deep dive visibility with transaction tracing using APM Insight.
>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>> _______________________________________________
>>> Bacula-users mailing list
>>> Bacula-users AT lists.sourceforge DOT net
>>> https://lists.sourceforge.net/lists/listinfo/bacula-users
>>
>> --------------------------------------------------------------------------
>> ----
>> One dashboard for servers and applications across Physical-Virtual-Cloud
>> Widest out-of-the-box monitoring support with 50+ applications
>> Performance metrics, stats and reports that give you Actionable Insights
>> Deep dive visibility with transaction tracing using APM Insight.
>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>> _______________________________________________
>> Bacula-users mailing list
>> Bacula-users AT lists.sourceforge DOT net
>> https://lists.sourceforge.net/lists/listinfo/bacula-users
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud 
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> Bacula-users mailing list
> Bacula-users AT lists.sourceforge DOT net
> https://lists.sourceforge.net/lists/listinfo/bacula-users
>


------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users