Re: [Bacula-users] Disk based backup using vchanger, volumes being marked as Error
2014-08-05 01:42:43
Hello Josh,
Please see below ...
On 08/04/2014 06:43 PM, Josh Fisher wrote:
On 8/1/2014 12:27 PM, Joseph Dickson
wrote:
Greetings :-)
I've run into this problem with Bacula in a previous
installation, and I can't seem to recall if there was ever a
resolution.. I'm using Bacula for disk based backups only,
and I am using vchanger to manage my virtual library.
I've configured a vchanger library with 100 slots and 8
drives, and have set a Maximum Volume Bytes of 100G on the
pool definition that I am using, to limit each slot in the
library to 100G. I have also set a Maximum Concurrent Jobs
= 2 setting on each of the virtual tape drive devices in my
storage director config, so that only two jobs can write to
a device at a time to minimize interleaving.
Everything works perfectly as long as I only kick a few
jobs off at a time.. however, when my main backup windows
run and 30 or 40 backup jobs kick off, I often end up with
jobs that output the following sequence in the logs:
Have you set PreferMountedVolumes=no in the Job resource in
bacula-dir.conf? If 3 jobs start and want to write to volumes in
the same pool, then all three can be assigned the same volume. In
fact, if PreferMountedVolumes=yes, (the default), then all three
WILL be assigned the same volume unless the pool restricts the max
number of jobs that the volume may contain. However, your device
(drive) restricts the max concurrent jobs to 2. Therefore one of
those three jobs will not be able to select the drive where the
volume is mounted and will be forced to select another unused
drive. That third job will nevertheless select the same volume as
the other two and attempt to move the volume from the drive it is
in into the drive that it has been assigned to. The configuration
has a built-in race condition.
I have recently done quite a bit of work to try to avoid race
conditions such as the one you describe above. Does this still
happen on version 7.0.x? I ask because there is now code that
*should* detect this and explicitly makes the third job (as you
describe above) wait. Now it is possible that there is some code
path in the SD where the new code does not apply, so I cannot
exclude problems, but if any exist in 7.0.x I would like to know so
I can work on it some more. With the new code, the Volume will be
moved around, but at least it should be done correctly without some
deadlock or failure.
Best regards,
Kern
Setting PreferMountedVolumes=no causes the three jobs to select a
drive that is NOT already mounted with a volume from the pool.
This allows jobs writing to the same pool to select different
volumes from the pool, rather than all selecting the same next
available volume. This has its own caveats. It doesn't necessarily
prevent two jobs from selecting the same volume in some cases,
meaning that they will want to swap the volume back and forth
between drives, which is another type of race condition. I have
used this method successfully for a pool containing full backups
only by setting PreferMountedVolumes=no in the job resource and
setting MaximumVolumeJobs=1 in the pool resource. Since Bacula
selects the volume for a job in an atomic manner, this forces an
exclusive set of volumes for each job, thus preventing the race
condition. This means that concurrency is limited only by the
number of drives, but at the "expense" of creating a greater
number of smaller volume files. I quote "expense" because on a
disk vchanger it isn't usually a big issue to have more volume
files. Doing this with a tape autochanger would use a lot more
tapes and be truly more expensive. Of course unlimited concurrency
is theoretical, since the hardware limits the USEFUL concurrency.
31-Jul
21:00 bacula1-dir JobId 692: Start Backup JobId 692,
Job=job-evolvereports-main.2014-07-31_21.00.00_48
31-Jul
21:00 bacula1-dir JobId 692: Using Device "chg1-drive-1"
to write.
31-Jul
21:00 evolvereports-fd JobId 692: DIR and FD clocks differ
by 50 seconds, FD automatically compensating.
31-Jul
21:05 bacula1-sd JobId 692: 3307 Issuing autochanger
"unload slot 74, drive 1" command.
31-Jul
21:06 bacula1-sd JobId 692: Warning: Volume "chg1_0001_0066"
wanted on "chg1-drive-1" (/var/lib/bacula/chg1/1/drive1)
is in use by device "chg1-drive-3"
(/var/lib/bacula/chg1/3/drive3)
31-Jul
21:06 bacula1-sd JobId 692: Warning: Volume "chg1_0001_0066"
not on file device "chg1-drive-1" (/var/lib/bacula/chg1/1/drive1).
31-Jul
21:06 bacula1-sd JobId 692: Marking Volume "chg1_0001_0066"
in Error in
Catalog.
31-Jul
21:06 bacula1-sd JobId 692: Warning: Volume "chg1_0001_0066"
not on file device "chg1-drive-1" (/var/lib/bacula/chg1/1/drive1).
31-Jul
21:06 bacula1-sd JobId 692: Marking Volume "chg1_0001_0066"
in Error in
Catalog.
31-Jul
21:06 bacula1-sd JobId 692: Warning: mount.c:212 Open of
file device "chg1-drive-1" (/var/lib/bacula/chg1/1/drive1) Volume"chg1_0001_0066"
failed: ERR=file_dev.c:172 Could not
open(/var/lib/bacula/chg1/1/drive1,OPEN_READ_WRITE,0640):
ERR=No such file or directory
31-Jul
21:06 bacula1-sd JobId 692: 3307 Issuing autochanger
"unload slot 71, drive 2" command.
31-Jul
21:06 bacula1-sd JobId 692: 3304 Issuing autochanger "load
slot 71, drive 1" command.
31-Jul
21:06 bacula1-sd JobId 692: 3305 Autochanger "load slot
71, drive 1", status is OK.
31-Jul
21:06 bacula1-sd JobId 692: Volume "chg1_0001_0071"
previously written, moving to end of data.
31-Jul
21:06 bacula1-sd JobId 692: Ready to append to end of Volume "chg1_0001_0071"
size=8,003,988,010
This
ends up marking my perfectly usable volume as Error in the
catalog. Is this something that everyone runs into? Is
there any fix? As I recall when I looked into it a few
years back, the issue was the order and timing of volume
and device selection, but it's definitely been a while.
My bacula-sd.conf file is here:
Any guidance would be appreciated!
Thanks,
------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
------------------------------------------------------------------------------
Infragistics Professional
Build stunning WinForms apps today!
Reboot your WinForms applications with our WinForms controls.
Build a bridge from your legacy apps to the future.
http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
|
------------------------------------------------------------------------------
Infragistics Professional
Build stunning WinForms apps today!
Reboot your WinForms applications with our WinForms controls.
Build a bridge from your legacy apps to the future.
http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk _______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
|
|
|