Bacula-users

Re: [Bacula-users] Tape changer with simultaneous jobs, using (but not actually using) a tape

2010-03-30 18:32:32
Subject: Re: [Bacula-users] Tape changer with simultaneous jobs, using (but not actually using) a tape
From: Craig Miskell <craig.miskell AT opus.co DOT nz>
To: bacula-users <bacula-users AT lists.sourceforge DOT net>
Date: Wed, 31 Mar 2010 11:29:21 +1300
Martin Simmons wrote:
>>>>>> On Tue, 30 Mar 2010 12:09:47 +1300, Craig Miskell said:
>> Hi,
>>      I have a TL-2000 tape changer with two drives.  I have two differential 
>> jobs which start at the same time, on different 
>> filesystems but the same client  (also happens to be the same server as the 
>> director/storage daemon).  This is bacula 5.0.0
>>
>> Both jobs interleave to the same tape (this is good IMO, although I don't 
>> know which knob I tweaked to make that happen 
>> :)), but one of the jobs first seems to claim a second tape, purge/recycle 
>> it, log the following rows in jobmedia:
>>   jobmediaid | jobid | mediaid | firstindex | lastindex | startfile | 
>> endfile | startblock | endblock | volindex
>>  
>> ------------+-------+---------+------------+-----------+-----------+---------+------------+----------+----------
>>        12953 |    38 |       7 |          0 |         0 |         0 |       
>> 0 |          0 |        0 |        1
>>        12956 |    38 |       8 |          1 |        37 |         0 |       
>> 0 |          1 |    14325 |        2
>>        12957 |    38 |       8 |         37 |        50 |         1 |       
>> 1 |          1 |    15499 |        3
>> ....
>>
>> where mediaid 7 is the tape it just recycled and didn't use, and mediaid 8 
>> is the tape it's sharing with the other job.
>>
>> I'm happy for it to interleave, but I'm curious as to why it touches that 
>> first tape (7) but then decides not to use it. 
>>    It's effectively recycling/nuking a backup that doesn't need to be, 
>> requiring two tapes where one will do.   Anyone 
>> got any idea what's going on?  Happy to file a bug if it is one, but don't 
>> want to false report.
> 
> I suggest trying it in bacula 5.0.1 first.
Is that because you have some concrete knowledge of a fix that is in 5.0.1, or 
just a blind "upgrade to the latest" 
suggestion?  It's a non-trivial amount of work for something that might just be 
speculation.

> 
> If that still fails, what does it report in the log say when it decides not to
> use the first tape?
Well, still on 5.0.0 for now, in case this sheds any light on the matter, 
here's the logs (latest at the top) from the 
start of the jobs (at 21:00:00 up to where it starts actually reading/writing 
data.  000006L4 is the volume that gets 
recycled but not used, 000013L4 is the one that is used.   I can't quite figure 
out why it doesn't write anything to 
6L4; it's loaded and ready nearly 2 minutes before 13L4 is loaded, so it should 
have had something written to 5L4 by then.

    948 |    41 | 2010-03-30 21:06:06 | ausv01-fd JobId 41:      /tmp is a 
different filesystem. Will not descend from / 
into /tmp
                                      :
    947 |    41 | 2010-03-30 21:06:06 | ausv01-dir JobId 41: Max Volume jobs 
exceeded. Marking Volume "000013L4" as Used.
                                      :
    946 |    41 | 2010-03-30 21:06:06 | ausv01-sd JobId 41: Recycled volume 
"000013L4" on device "Drive-1" (/dev/nst0), 
all previous data lost.
                                      :
    945 |    41 | 2010-03-30 21:06:03 | ausv01-sd JobId 41: 3305 Autochanger 
"load slot 12, drive 0", status is OK.
                                      :
    944 |    41 | 2010-03-30 21:05:27 | ausv01-sd JobId 41: 3304 Issuing 
autochanger "load slot 12, drive 0" command.
                                      :
    943 |    41 | 2010-03-30 21:04:42 | ausv01-sd JobId 41: 3307 Issuing 
autochanger "unload slot 5, drive 0" command.
                                      :
    942 |    41 | 2010-03-30 21:04:42 | ausv01-dir JobId 41: Recycled volume 
"000013L4"
                                      :
    941 |    41 | 2010-03-30 21:04:42 | ausv01-dir JobId 41: All records pruned 
from Volume "000013L4"; marking it "Purged"
                                      :
    940 |    41 | 2010-03-30 21:04:42 | ausv01-dir JobId 41: There are no more 
Jobs associated with Volume "000013L4". 
Marking it purged.
                                      :
    939 |    40 | 2010-03-30 21:04:09 | ausv01-dir JobId 40: Max Volume jobs 
exceeded. Marking Volume "000006L4" as Used.
                                      :
    938 |    40 | 2010-03-30 21:04:09 | ausv01-sd JobId 40: Recycled volume 
"000006L4" on device "Drive-1" (/dev/nst0), 
all previous data lost.
                                      :
    937 |    40 | 2010-03-30 21:04:07 | ausv01-sd JobId 40: 3305 Autochanger 
"load slot 5, drive 0", status is OK.
                                      :
    936 |    40 | 2010-03-30 21:03:33 | ausv01-sd JobId 40: 3304 Issuing 
autochanger "load slot 5, drive 0" command.
                                      :
    935 |    40 | 2010-03-30 21:01:53 | ausv01-sd JobId 40: 3307 Issuing 
autochanger "unload slot 1, drive 0" command.
                                      :
    934 |    41 | 2010-03-30 21:00:14 | ausv01-dir JobId 41: Sending Accurate 
information.
                                      :
    933 |    41 | 2010-03-30 21:00:13 | ausv01-dir JobId 41: Using Device 
"Drive-1"
                                      :
    932 |    41 | 2010-03-30 21:00:08 | ausv01-dir JobId 41: Start Backup JobId 
41, Job=ausv01-other.2010-03-30_21.00.00_29
                                      :
    931 |    40 | 2010-03-30 21:00:00 | ausv01-dir JobId 40: Sending Accurate 
information.
                                      :
    930 |    40 | 2010-03-30 21:00:00 | ausv01-dir JobId 40: Using Device 
"Drive-1"
                                      :
    929 |    40 | 2010-03-30 21:00:00 | ausv01-dir JobId 40: Recycled volume 
"000006L4"
                                      :
    928 |    40 | 2010-03-30 21:00:00 | ausv01-dir JobId 40: All records pruned 
from Volume "000006L4"; marking it "Purged"
                                      :
    927 |    40 | 2010-03-30 21:00:00 | ausv01-dir JobId 40: There are no more 
Jobs associated with Volume "000006L4". 
Marking it purged.
                                      :
    926 |    40 | 2010-03-30 21:00:00 | ausv01-dir JobId 40: Start Backup JobId 
40, Job=ausv01-gd01.2010-03-30_21.00.00_28

I'm going to try to work around this by setting Maximum Concurrent Jobs = 1 on 
each tape drive in the changer, and see 
what works.  If I avoid the interleaving, I should avoid the unnecessary 
recycling.   It's also possible that I'd get 
some results by starting one of the jobs 15 minutes later than the other, by 
which time it'll already have a tape ready 
and writing for the first job, so it'll use that one.

I'd still be keen on some sort of understanding of what's going on in the first 
place though; my lack of understanding 
disturbs me.

Thanks,

-- 
Craig Miskell
Senior Systems Administrator
Opus International Consultants
Phone: +64 4 471 7209
A linguistics professor was lecturing to his class one day. "In
English," he said, "a double negative forms a positive. In some
languages though, such as Russian, a double negative is still a
negative. However," he pointed out, "there is no language wherein
a double positive can form a negative."

A voice from the back of the room piped up, "Yeah, right."

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users