Bacula-users

Re: [Bacula-users] Multiple drives in changer

2010-04-07 10:27:02
Subject: Re: [Bacula-users] Multiple drives in changer
From: Bob Hetzel <beh AT case DOT edu>
To: bacula-users AT lists.sourceforge DOT net
Date: Wed, 07 Apr 2010 10:24:32 -0400
> Date: Tue, 6 Apr 2010 08:52:24 -0600
> From: Robert LeBlanc <robert AT leblancnet DOT us>
>
> On Tue, Apr 6, 2010 at 6:13 AM, Matija Nalis
> <mnalis+bacula AT carnet DOT hr<mnalis%2Bbacula AT carnet DOT hr>
>> > wrote:
>> > On Fri, Apr 02, 2010 at 10:36:59AM -0600, Robert LeBlanc wrote:
>>> > > On Fri, Apr 2, 2010 at 2:44 AM, Matija Nalis <mnalis+bacula AT carnet 
>>> > > DOT hr<mnalis%2Bbacula AT carnet DOT hr>
>>> > >
>>>> > > > I think you need to set
>>>> > > > Prefer Mounted Volumes = no
>>> > >
>>> > > I guess this is where we need clarification about what is an available
>>> > > drive. I took this to mean a drive that has no tape is more available,
>> > and
>>> > > then a drive that does already have a tape mounted would be next in
>>> > > availability.
>> >
>> > Hm, it looks to me that any drive which is not doing R/W operation
>> > (no matter if there is a tape in drive or not) is counted as available.
>> > I could be wrong on that, though.
>> >
>> > Anyway, the safest way to know is to test it and let the others know
>> > how it goes :)
>> >
>>From my observations of a few tests, this indeed seems to be the case. If
> the drive is not being R/W to/from, it is considered available.
>
>
>>> > > It seems that as long as no job is writing to that tape, then
>>> > > the drive is available. I do want this setting to yes and not no,
>> > however, I
>>> > > would like to minimize tape changes, but take advantage of the multiple
>>> > > drives.
>> >
>> > From what I see in practice, "Prefer Mounted Volumes = yes" would
>> > make sure there is only one drive in each pool that does the writing.
>> >
>> > For example, I have pool of 4 drives and I start 10 jobs at the same
>> > time, all using the same pool. I have an concurrency of >10 and
>> > spooling enabled, so all the jobs run at once and start spooling to
>> > disk -- but when they need to despool, one drive will grab a free
>> > tape from Scratch, and all the jobs will wait for their in turn to
>> > write to one tape in one drive, leaving 3 drives idle all the time.
>> > Only when that tape is full, another one is loaded, and the process
>> > repeats.
>> >
>> > I think same happens when I disable spooling, but then the 4 jobs all
>> > interleave writes -- but still all of them will write on one tape in
>> > one drive only.
>> >
>> > If you set "Prefer Mounted Volumes = no", then all 4 drives get
>> > loaded with 4 fresh tapes (or just use them if right tapes are
>> > already in right drives -- I guess, I have autochanger) and each
>> > tape gets written to at the same time, maximizing drive (and thus,
>> > the tape) usage.
>> >
>> > But "no" setting can (or at least could in the past) lead to
>> > deadlocks sometimes (if you have autochanger), when no new jobs will
>> > get serviced because drive A will wait for tape 2 that is currently
>> > in drive B, and at the same time drive B will wait for tape 1 which
>> > is currently in drive A. Then the manual intervention (umount/mount)
>> > is needed (which is a big problem for us as we have lots of jobs/tapes).
>> >
>> > The (recommended) alternative is to go semi-manual way -- dedicate
>> > special pool for each drive, and go with "Prefer Mounted Volumes =
>> > yes" Then one can (and indeed, must) specify manually which jobs will
>> > go in which pools (and hence, in which drives) and can optimize it
>> > for maximum parallelism without deadlocks -- but it requires more
>> > planing and is problematic if your backups are more dynamic and hard
>> > to predict, and you have to redesign when you add/upgrade/remove
>> > drives, and your pools might become somewhat harder to manage.
>> >
> This is exactly my experience, and my goal is not to use multiple drives in
> the same pool at the same time, it's to use drives for different pools at
> the same time one drive per pool. We are looking to bring up a lot more
> storage in the future and will probably adopt the mentality of multiple
> daily, weekly, monthly pools and split them up based on the number of drives
> we want to run concurrently. I think that is the best way to go with Bacula
> for what we want to do.
>
> Thanks,
>
> Robert LeBlanc
> Life Sciences & Undergraduate Education Computer Support
> Brigham Young University


Your other option is to try out the "Maximum Concurrent Jobs" in the device 
section of your storage daemon's config.  That's working well for me.  One 
word of caution though: since the way it allocates jobs to drives is not 
the same as prefer mounted volumes you should carefully consider all the 
different concurrent parameters and how they relate to each other.

Here's an example of how I first tried it (not very good)...

dir.conf:
Director: Maximum concurrent jobs = 20
Jobdefs: Maximum concurrent jobs = 10

sd.conf:
Storage: Maximum Concurrent jobs = 20
Device1: Maximum concurrent jobs = 2
Device2: Maximum concurrent jobs = 2

The way prefer mounted volumes worked, it round robin alternated the drive 
assignments for each job as it started them.  For This new directive it 
does the opposite, assigns them jobs to the first until it hits the max, 
then assigns the rest to the next drives.  With only 2 drives though, the 
result I got was 2 jobs on the 1st drive and 8 on the 2nd drive.

The upshot I got from this is that you should set the Jobdefs max to a 
multiple of the Device max.  I've got my Jobdefs max currently set to 16 
and my device max's set to 8 and there appears to be substantial 
improvement in overall time to complete the scheduled backups over when I 
was running with the prefer mounted volumes = no setting.

This also has the advantage over defining one pool per drive in that you 
can easily add and subtract drives to your autochanger.




------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>