Hi folks,
I have a very slow / long running backup job that starts up on 17:00
Friday and usually takes about 30h to complete. Later on that same
Friday, some more full backups are scheduled around 21:00 which take
off fine at first so at 21:00, I have six jobs running in parallel
onto the same online disk volume (which is the maximum concurrent job
setting for the online full backup device).
However it seems that once these parallel jobs finish, the rest just
sit there waiting in line for the long running job to finish, so I
have only one job running with the rest waiting on "max storage jobs"
like so:
28187 Full slow_server.2012-06-08_17.00.00_52 is running
28250 Full server20.2012-06-08_21.00.00_56 is waiting on max
Storage jobs
28251 Full server21.2012-06-08_21.00.00_57 is waiting on max
Storage jobs
28252 Full server29.2012-06-08_21.00.00_58 is waiting on max
Storage jobs
28253 Full server149.2012-06-08_21.00.00_59 is waiting on max
Storage jobs
28254 Full server151.2012-06-08_21.00.01_00 is waiting on max
Storage jobs
28255 Full server153.2012-06-08_21.00.02_01 is waiting on max
Storage jobs
28256 Full server140.2012-06-08_21.00.02_02 is waiting on max
Storage jobs
28257 Full server155.2012-06-08_21.00.02_03 is waiting on max
Storage jobs
28258 Full server166.2012-06-08_21.00.02_04 is waiting on max
Storage jobs
28259 Full server2032.2012-06-08_21.00.02_05 is waiting on max
Storage jobs
28260 Full server2035.2012-06-08_21.00.02_06 is waiting on max
Storage jobs
Please not that the scheduled start times of the jobs is yesterday
21:00, but it seems bacula never realizes there's only one running job
left and fails to kick off the waiting jobs until "slow server" is
finished.
Here's the list of finished jobs:
Terminated Jobs:
JobId Level Files Bytes Status Finished Name
====================================================================
28202 Incr 9,160 10.88 G OK 08-Jun-12 20:18
server2061
28237 Incr 445 51.02 G OK 08-Jun-12 21:02
server2095
28243 Full 145,512 7.480 G OK 08-Jun-12 21:09 server186
28246 Full 143,452 48.74 G OK 08-Jun-12 21:32 server150
28245 Full 3,437,684 103.1 G OK 08-Jun-12 23:49 server48
28249 Full 25,253 89.03 G OK 09-Jun-12 01:48 server157
28244 Full 7,338,130 59.49 G OK 09-Jun-12 02:01
server188
28247 Full 2,386,472 169.5 G OK 09-Jun-12 02:29
server2057
28248 Full 2,495,859 204.7 G OK 09-Jun-12 09:03
server2047
28186 Full 2,310,264 109.0 G OK 09-Jun-12 09:37 server16
And here's the stat storage output (snipped to include only relevant
devices):
Running Jobs:
Writing: Full Backup job slow_server JobId=28187 Volume="full-0064"
pool="Online_full" device="FileStorage_full"
(/mnt/msa/online_backup)
spooling=0 despooling=0 despool_wait=0
Files=8,237,188 Bytes=95,530,014,225 Bytes/sec=1,568,405
FDReadSeqNo=74,357,836 in_msg=49704054 out_msg=5 fd=9
====
Jobs waiting to reserve a drive:
====
Used Volume status:
full-0064 on device "FileStorage_full" (/mnt/msa/online_backup)
Reader=0 writers=1 devres=0 volinuse=1
Is this the expected behaviour, is this a bug in bacula 5.2.6 or am I
misunderstanding something about bacula's configuration? Once I cancel
the "slow_server" job, things take off again as expected with six jobs
running concurrently.
All the best & thanks in advance for your thoughts,
Uwe
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
|