Bacula-users

Re: [Bacula-users] Concurrent Backups with a Virtual Autochanger

2014-11-18 14:18:28
Subject: Re: [Bacula-users] Concurrent Backups with a Virtual Autochanger
From: "Brady, Mike" <mike.brady AT devnull.net DOT nz>
To: bacula-users AT lists.sourceforge DOT net
Date: Wed, 19 Nov 2014 08:15:30 +1300
On 2014-11-16 02:12, Josh Fisher wrote:
> On 11/14/2014 6:17 PM, Brady, Mike wrote:
>> First of all thanks to Kern and Bacula Systems for making the "Best
>> Practices for Disk Based Backup" and "Disk Back Design" documents
>> available.
>> 
>> I have been playing around with the best way for doing concurrent
>> backups for a while and these documents have helped my understanding
>> considerably.  Using a Virtual Autochanger in particular seems an
>> elegant way of doing what I would like to do.
>> 
>> However, I am seeing some behaviour in my testing that I did not 
>> expect
>> and I need some input.
>> 
>> At a high level what I am trying to do is use a Virtual Autochanger to
>> write to multiple volumes in the same pool concurrently.
> 
> This is likely why only one job at a time is finding a volume. Job 1
> finds the next available volume in the pool, or labels a new one, and
> loads it into its assigned drive and begins writing. The subsequent 
> jobs
> see the volume that job 1 is using as the next available volume.
> However, they are being assigned a different drive, so must wait for 
> the
> volume to become available before moving it out of one drive and into
> another.

Not quite.  This only happens at the start of a schedule.  Once the job 
on the second device retries, which it does after 5 minutes (don't know 
what that time out is), it gets a second volume from the pool and all 
subsequent jobs use both devices concurrently as the devices become 
available.

> 
> There is a natural race condition when multiple jobs run and write to
> the same pool, however it is not an error condition. This is how Bacula
> handles the race. The drive and volume selection process is serialized
> so that only one job at a time can choose, thus only one job can win 
> the
> race.
> 
> Bacula has ways to tweak the olgorithm used. One of those is the
> PreferMountedVolumes directive in the Job resource. It defaults to yes,
> meaning jobs will, during their turn at the volume and drive selection
> process, prefer to select one of the pool's volumes that is already
> mounted in a drive. Setting this to no means that the job will try to
> select a volume that is NOT already mounted in a drive. This will
> prevent the second job from selecting the same volume as the first job,
> because that volume will already be loaded in a drive by the time the
> second job gets its turn.
> 
> Try either setting PreferMountedVolumes=no or else divide jobs amongst
> different pools to get concurrency.

The manual gives a rather dyer warning against using 
PreferMountedVolumes

"we recommend against setting this directive to no since it tends to add 
a lot of
swapping of Volumes between the different drives and can easily lead to 
deadlock situations in the Storage daemon. We will accept bug reports 
against it, but we cannot guarantee that we will be able to fix the 
problem in a reasonable time."

so I am hesitant to use it.  The "Best Practices for Disk Based Backups" 
white paper also warns against its use.

At this point I have stopped investigating the use of a Virtual 
Autochanger.  Both the selection of volumes from the same pool and the 
auto creation/labeling of volumes in a pool do not look to be 
thread/concurrent safe operations based on my limited testing.

There may well be ways of configuring around these issues but I do not 
have the time to do the amount of testing that will be required at the 
moment.

A single storage with a single device with multiple jobs writing 
concurrently will meet my requirements for now.

Thanks for your suggestions.

> 
>> 
>> At the moment I have two devices limited to one concurrent job each.
>> Which, if I have understood things correctly, means that I should have
>> two jobs running concurrently writing to separate volumes.  The 
>> schedule
>> below kicks off eight jobs simultaneously with the number of devices
>> limiting concurrency.
>> 
>> This issue that I am having is that the first job gets FileChgr1-Dev1
>> and a volume as expected.
>> 
>> The second job gets device FileChgr1-Dev2 as expected, but always says
>> "Cannot find any appendable volumes." and issues a mount request.  
>> There
>> are multiple purged volumes with the recycle flag set available in the
>> IncPoool pool. Even if there weren't, the pool has Auto Labelling
>> configured and has not reached the MaximumVolumes limit, so there 
>> should
>> "always" be a volume available.
>> 
>> Other jobs continue to use the FileChgr1-Dev1 as it becomes available
>> while FileChgr1-Dev2 is waiting for a volume.
>> 
>> The second job eventually retries on FileChgr1-Dev2, gets an available
>> volume and successfully completes without any operator intervention.
>> 
>> After this the remaining jobs utilise both FileChgr1-Dev1 and
>> FileChgr1-Dev2 as they become available as I expected.
>> 
>> Is this behaviour expected (I am assuming some sort of race condition 
>> at
>> the start of the schedule with multiple jobs trying to get a volume at
>> the same time) or am I trying to do something fundamentally wrong 
>> here?
>> 
>> My configuration is:
>> 
>> Pool {
>>     Name = IncPool
>>     Pool Type = Backup
>>     Volume Use Duration = 23 hours
>>     Recycle = yes
>>     Action On Purge = Truncate
>>     Auto Prune = yes
>>     Maximum Volumes = 50
>>     Volume Retention = 2 weeks
>>     Storage = FileStorage01
>>     Next Pool = "IncPoolCopy"
>>     Label Format = "IncPool-"
>> }
>> 
>> Storage {
>>     Name = FileStorage01
>>     Address = 192.168.42.45
>>     SDPort = 9103
>>     Password = ***************************
>>     Device = FileChgr1
>>     Media Type = File01
>>     Maximum Concurrent Jobs = 10
>>     Autochanger = yes
>> }
>> 
>> Autochanger {
>>     Name = FileChgr1
>>     Device = FileChgr1-Dev1, FileChgr1-Dev2
>>     Changer Command = /dev/null # For  7.0.0 and newer releases.
>>     # Changer Command = "" # For 5.2 and older releases.
>>     Changer Device = /dev/null
>> }
>> 
>> Device {
>>     Name = FileChgr1-Dev1
>>     Drive Index = 0
>>     Media Type = File01
>>     Archive Device = /bacula_storage/FileDevice
>>     LabelMedia = yes;
>>     Random Access = Yes;
>>     AutomaticMount = yes;
>>     RemovableMedia = no;
>>     AlwaysOpen = no;
>>     Maximum Concurrent Jobs = 1
>>     VolumePollInterval = 5s
>>     Autochanger = yes
>> }
>> 
>> Device {
>>     Name = FileChgr1-Dev2
>>     Drive Index = 1
>>     MediaType = File01
>>     ArchiveDevice = /bacula_storage/FileDevice
>>     LabelMedia = yes;
>>     RandomAccess = Yes;
>>     AutomaticMount = yes;
>>     RemovableMedia = no;
>>     AlwaysOpen = no;
>>     MaximumConcurrent Jobs = 1
>>     VolumePollInterval = 5s
>>     Autochanger = yes
>> }
>> 
>> Schedule {
>>     Name = "DefaultBackupCycle"
>>     Run = Level=Full 1st sun at 00:10
>>     Run = Level=Differential 2nd-5th sun at 00:10
>>     Run = Level=Incremental mon-sat at 00:10
>> }
>> 
>> Thanks
>> 
>> Mike
>> 
>> ------------------------------------------------------------------------------
>> Comprehensive Server Monitoring with Site24x7.
>> Monitor 10 servers for $9/Month.
>> Get alerted through email, SMS, voice calls or mobile push 
>> notifications.
>> Take corrective actions from your mobile device.
>> http://pubads.g.doubleclick.net/gampad/clk?id=154624111&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Bacula-users mailing list
>> Bacula-users AT lists.sourceforge DOT net
>> https://lists.sourceforge.net/lists/listinfo/bacula-users
> 
> 
> ------------------------------------------------------------------------------
> Comprehensive Server Monitoring with Site24x7.
> Monitor 10 servers for $9/Month.
> Get alerted through email, SMS, voice calls or mobile push 
> notifications.
> Take corrective actions from your mobile device.
> http://pubads.g.doubleclick.net/gampad/clk?id=154624111&iu=/4140/ostg.clktrk
> _______________________________________________
> Bacula-users mailing list
> Bacula-users AT lists.sourceforge DOT net
> https://lists.sourceforge.net/lists/listinfo/bacula-users

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users