Bacula-users

Re: [Bacula-users] Disk based backup using vchanger, volumes being marked as Error

2014-08-05 13:00:49
Subject: Re: [Bacula-users] Disk based backup using vchanger, volumes being marked as Error
From: Kern Sibbald <kern AT sibbald DOT com>
To: Josh Fisher <jfisher AT pvct DOT com>, bacula-users AT lists.sourceforge DOT net
Date: Tue, 05 Aug 2014 18:54:58 +0200
On 08/05/2014 02:10 PM, Josh Fisher wrote:
On 8/5/2014 1:36 AM, Kern Sibbald wrote:
Hello Josh,

Please see below ...

On 08/04/2014 06:43 PM, Josh Fisher wrote:
On 8/1/2014 12:27 PM, Joseph Dickson wrote:
Greetings :-)

I've run into this problem with Bacula in a previous installation, and I can't seem to recall if there was ever a resolution..  I'm using Bacula for disk based backups only, and I am using vchanger to manage my virtual library.  

I've configured a vchanger library with 100 slots and 8 drives, and have set a Maximum Volume Bytes of 100G on the pool definition that I am using, to limit each slot in the library to 100G.  I have also set a Maximum Concurrent Jobs = 2 setting on each of the virtual tape drive devices in my storage director config, so that only two jobs can write to a device at a time to minimize interleaving.

Everything works perfectly as long as I only kick a few jobs off at a time.. however, when my main backup windows run and 30 or 40 backup jobs kick off, I often end up with jobs that output the following sequence in the logs:

Have you set PreferMountedVolumes=no in the Job resource in bacula-dir.conf? If 3 jobs start and want to write to volumes in the same pool, then all three can be assigned the same volume. In fact, if PreferMountedVolumes=yes, (the default), then all three WILL be assigned the same volume unless the pool restricts the max number of jobs that the volume may contain. However, your device (drive) restricts the max concurrent jobs to 2. Therefore one of those three jobs will not be able to select the drive where the volume is mounted and will be forced to select another unused drive. That third job will nevertheless select the same volume as the other two and attempt to move the volume from the drive it is in into the drive that it has been assigned to. The configuration has a built-in race condition.

I have recently done quite a bit of work to try to avoid race conditions such as the one you describe above.  Does this still happen on version 7.0.x?   I ask because there is now code that *should* detect this and explicitly makes the third job (as you describe above) wait.  Now it is possible that there is some code path in the SD where the new code does not apply, so I cannot exclude problems, but if any exist in 7.0.x I would like to know so I can work on it some more.  With the new code, the Volume will be moved around, but at least it should be done correctly without some deadlock or failure.


I haven't had a chance to update to 7.0.x yet, so I can't say. My thought is that the volume itself should have a "Maximum Concurrent Jobs" setting, in addition to the SD Device. Better still, it could be automated by forcing the volume's max concurrency to that of the SD device at mount time. That should eliminate the need for "Prefer Mounted Volumes" altogether, since once the "Maximum Concurrent Jobs" have selected the volume, subsequent jobs would reject it as unavailable and so see the drive it is mounted in as unavailable at drive selection time. Once a drive is selected, that volume would be viewed as unavailable and rejected during volume selection, at least until one of the jobs using the volume ends. So by setting "Max Concurrent Jobs" to 1, one could guarantee a volume would never be selected by more than one job at a time.

Yes, I would like to do something like what you say, but unfortunately I concerned that it opens up other possibilities for race conditions due to the fact that there are 3 components dealing with the data (SD, DIR, and the catalog).

I have some ideas, and I hope to implement them in the next major Bacula version.

Best regards,
Kern
...
------------------------------------------------------------------------------
Infragistics Professional
Build stunning WinForms apps today!
Reboot your WinForms applications with our WinForms controls. 
Build a bridge from your legacy apps to the future.
http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users