Bacula-users

Re: [Bacula-users] wanted on DEVICE-0, is in use by device DEVICE-1

2012-11-05 18:01:30
Subject: Re: [Bacula-users] wanted on DEVICE-0, is in use by device DEVICE-1
From: Stephen Thompson <stephen AT seismo.berkeley DOT edu>
To: bacula-users AT lists.sourceforge DOT net
Date: Mon, 05 Nov 2012 14:57:33 -0800

Going to try this out.

Stephen



On 11/05/2012 02:40 PM, Josh Fisher wrote:
>
> On 11/5/2012 4:28 PM, Stephen Thompson wrote:
>> On 11/05/2012 01:17 PM, Josh Fisher wrote:
>>> On 11/5/2012 11:03 AM, Stephen Thompson wrote:
>>>> On 11/5/12 7:59 AM, John Drescher wrote:
>>>>>> I've had the following problem for ages (meaning multiple major
>>>>>> revisions of bacula) and I've seen this come up from time to time on the
>>>>>> mailing list, but I've never actually seen a resolution (please point me
>>>>>> to one if it's been found).
>>>>>>
>>>>>>
>>>>>> background:
>>>>>>
>>>>>> I run monthly Fulls and nightly Incrementals.  I have a 2 drive
>>>>>> autochanger dedicated to my Incrementals.  I launch something like ~150
>>>>>> Incremental jobs each night.  I am configured for 8 concurrent jobs on
>>>>>> the Storage Daemon.
>>>>>>
>>>>>>
>>>>>> PROBLEM:
>>>>>>
>>>>>> The first job(s) grab one of the 2 devices available in the changer
>>>>>> (which is set to AutoSelect) and either load a tape, or use a tape from
>>>>>> the previous evening.  All tapes in the changer are in the same
>>>>>> Incremenal-Pool.
>>>>>>
>>>>>> The second jobs(s) grab the other of the 2 devices available in the
>>>>>> changer, but want to use the same tape that's just been mounted (or put
>>>>>> into use) on the jobs that got launched first.  They will often literal
>>>>>> wait the entire evening until 100's of jobs run through on only one
>>>>>> device, until that tape is freed up, at which point it is unmounted from
>>>>>> the first device and moved to the second.
>>>>>>
>>>>>> Note, the behaviour seems to be to round-robin my 8 concurrency limit
>>>>>> between the 2 available drives, which mean 4 jobs will run, and 4 jobs
>>>>>> will block on waiting for the wanted Volume.  When the original 4 jobs
>>>>>> are completed (not at the same time) additional jobs are launched that
>>>>>> keep that wanted Volume in use.
>>>>>>
>>>>>>
>>>>>> LOG:
>>>>>>
>>>>>> 03-Nov 22:00 DIRECTOR JobId 267433: Start Backup JobId 267433, Job=JOB.
>>>>>> 2012-11-03_22.00.00_0403-Nov 22:00 DIRECTOR JobId 267433: Using Device
>>>>>> "L100-Drive-0"03-Nov 22:00 DIRECTOR JobId 267433: Sending Accurate
>>>>>> information.
>>>>>> 03-Nov 22:00 sd_L100_ JobId 267433: 3307 Issuing autochanger "unload
>>>>>> slot 82, drive 0" command.
>>>>>> 03-Nov 22:06 lawson-sd_L100_ JobId 267433: Warning: Volume "IM0108"
>>>>>> wanted on "L100-Drive-0" (/dev/L100-Drive-0) is in use by device
>>>>>> "L100-Drive-1" (/dev/L100-Drive-1)
>>>>>> 03-Nov 22:09 sd_L100_ JobId 267433: Warning: Volume "IM0108" wanted on
>>>>>> "L100-Drive-0" (/dev/L100-Drive-0) is in use by device "L100-Drive-1"
>>>>>> (/dev/L100-Drive-1)
>>>>>> 03-Nov 22:09 sd_L100_ JobId 267433: Warning: mount.c:217 Open device
>>>>>> "L100-Drive-0" (/dev/L100-Drive-0) Volume "IM0108" failed: ERR=dev.c:513
>>>>>> Unable to open device "L100-Drive-0" (/dev/L100-Drive-0): ERR=No medium
>>>>>> found
>>>>>> .
>>>>>> .
>>>>>> .
>>>>>>
>>>>>>
>>>>>> CONFIGS (partial and seem pretty straight-forward):
>>>>>>
>>>>>> Schedule {
>>>>>>         Name = "DefaultSchedule"
>>>>>>         Run = Level=Incremental                               sat-thu at 
>>>>>> 22:00
>>>>>>         Run = Level=Differential                              fri     at 
>>>>>> 22:00
>>>>>> }
>>>>>>
>>>>>> JobDefs {
>>>>>>         Name = "DefaultJob"
>>>>>>         Type = Backup
>>>>>>         Level = Full
>>>>>>         Schedule = "DefaultSchedule"
>>>>>>         Incremental Backup Pool = Incremental-Pool
>>>>>>         Differential Backup Pool = Incremental-Pool
>>>>>> }
>>>>>>
>>>>>> Pool {
>>>>>>         Name = Incremental-Pool
>>>>>>         Pool Type = Backup
>>>>>>         Storage = L100-changer
>>>>>> }
>>>>>>
>>>>>> Storage {
>>>>>>         Name = L100-changer
>>>>>>         Device = L100-changer
>>>>>>         Media Type = LTO-3
>>>>>>         Autochanger = yes
>>>>>>         Maximum Concurrent Jobs = 8
>>>>>> }
>>>>>>
>>>>>> Autochanger {
>>>>>>         Name = L100-changer
>>>>>>         Device = L100-Drive-0
>>>>>>         Device = L100-Drive-1
>>>>>>         Changer Device = /dev/L100-changer
>>>>>> }
>>>>>>
>>>>>> Device {
>>>>>>         Name = L100-Drive-0
>>>>>>         Drive Index = 0
>>>>>>         Media Type = LTO-3
>>>>>>         Archive Device = /dev/L100-Drive-0
>>>>>>         AutomaticMount = yes;
>>>>>>         AlwaysOpen = yes;
>>>>>>         RemovableMedia = yes;
>>>>>>         RandomAccess = no;
>>>>>>         AutoChanger = yes;
>>>>>>         AutoSelect = yes;
>>>>>> }
>>>>>>
>>>>>> Device {
>>>>>>         Name = L100-Drive-1
>>>>>>         Drive Index = 0
>>>>>>         Media Type = LTO-3
>>>>>>         Archive Device = /dev/L100-Drive-1
>>>>>>         AutomaticMount = yes;
>>>>>>         AlwaysOpen = yes;
>>>>>>         RemovableMedia = yes;
>>>>>>         RandomAccess = no;
>>>>>>         AutoChanger = yes;
>>>>>>         AutoSelect = yes;
>>>>>> }
>>>>>>
>>>>> I do not have a good solution but I know by default bacula does not
>>>>> want to load the same pool into more than 1 storage device at the same
>>>>> time.
>>>>>
>>>>> John
>>>>>
>>>> I think it's something in the automated logic.  Because if I launch jobs
>>>> by hand (same pool across 2 tapes devices in same autochanger)
>>>> everything works fine.  I think it has more to do with the Scheduler
>>>> assigning the same same Volume to all jobs and then not wanting to
>>>> change that choice if that Volume is in use.
>>> When both jobs start at the same time and same priority, they see the
>>> same exact "next available volume" for the pool, and so both select the
>>> same volume. When they select different drives, it is a problem, since
>>> the volume can only be in one drive.
>>>
>>> When you start the jobs manually, I assume you are starting them at
>>> different times. This works, because the first job is up and running
>>> with the volume loaded before the second job begins its selection
>>> process. One way to handle this issue is to have a different Schedule
>>> for each job and start the jobs at different times with one second
>>> spacing. Jobs will still run concurrently, they just won't start up
>>> concurrently.
>>>
>> I suspected something like that, but would ask out loud "if bacula runs
>> into a contention like that and there are other available volumes in the
>> requested pool, why doesn't it decide to use another volume instead of
>> blocking?"
>
> See the Prefer Mounted Volume directive docs at
> http://www.bacula.org/5.2.x-manuals/en/main/main/Configuring_Director.html#SECTION0022150000000000000000
>
> This is a way to sort of deal with the issue, but is not ideal. I would
> prefer that the volume selection logic have an option to NEVER select a
> volume that is already loaded in another drive, or a volume that has
> been selected by another job unless the other job is assigned the same
> drive. This might indeed cause more tapes to be needed in a pool and be
> less efficient in terms of tape usage, but it would eliminate most of
> the problems encountered with concurrent jobs writing to the same pool.
>
>>
>> It's also disappointing, because we've already pulled virtually all of
>> our scheduling outside of bacula into scripts because the logic seldom
>> works out for us.  This may be another case of that.  I'm surprised this
>> isn't a more common concern.  What could be more run-of-the-mill than
>> having a nightly incremental pool within an autochanger with multiple
>> drives?
>>
>> thanks!
>> Stephen
>>
>>
>>>> If I do a status on the Director for instance and see the jobs for the
>>>> next day lined up in Scheduled jobs, they all have the same Volume listed.
>>>>
>>>> thanks,
>>>> Stephen
>>>
>>> ------------------------------------------------------------------------------
>>> LogMeIn Central: Instant, anywhere, Remote PC access and management.
>>> Stay in control, update software, and manage PCs from one command center
>>> Diagnose problems and improve visibility into emerging IT issues
>>> Automate, monitor and manage. Do more in less time with Central
>>> http://p.sf.net/sfu/logmein12331_d2d
>>> _______________________________________________
>>> Bacula-users mailing list
>>> Bacula-users AT lists.sourceforge DOT net
>>> https://lists.sourceforge.net/lists/listinfo/bacula-users
>>>
>>
>
>
> ------------------------------------------------------------------------------
> LogMeIn Central: Instant, anywhere, Remote PC access and management.
> Stay in control, update software, and manage PCs from one command center
> Diagnose problems and improve visibility into emerging IT issues
> Automate, monitor and manage. Do more in less time with Central
> http://p.sf.net/sfu/logmein12331_d2d
> _______________________________________________
> Bacula-users mailing list
> Bacula-users AT lists.sourceforge DOT net
> https://lists.sourceforge.net/lists/listinfo/bacula-users
>


-- 
Stephen Thompson               Berkeley Seismological Laboratory
stephen AT seismo.berkeley DOT edu    215 McCone Hall # 4760
404.538.7077 (phone)           University of California, Berkeley
510.643.5811 (fax)             Berkeley, CA 94720-4760

------------------------------------------------------------------------------
LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users