Bacula-users

Re: [Bacula-users] Tape changer with simultaneous jobs, using (but not actually using) a tape

2010-03-31 14:46:36
Subject: Re: [Bacula-users] Tape changer with simultaneous jobs, using (but not actually using) a tape
From: Martin Simmons <martin AT lispworks DOT com>
To: bacula-users AT lists.sourceforge DOT net
Date: Wed, 31 Mar 2010 19:43:37 +0100
>>>>> On Wed, 31 Mar 2010 11:29:21 +1300, Craig Miskell said:
> 
> Martin Simmons wrote:
> >>>>>> On Tue, 30 Mar 2010 12:09:47 +1300, Craig Miskell said:
> >> Hi,
> >>    I have a TL-2000 tape changer with two drives.  I have two differential 
> >> jobs which start at the same time, on different 
> >> filesystems but the same client  (also happens to be the same server as 
> >> the director/storage daemon).  This is bacula 5.0.0
> >>
> >> Both jobs interleave to the same tape (this is good IMO, although I don't 
> >> know which knob I tweaked to make that happen 
> >> :)), but one of the jobs first seems to claim a second tape, purge/recycle 
> >> it, log the following rows in jobmedia:
> >>   jobmediaid | jobid | mediaid | firstindex | lastindex | startfile | 
> >> endfile | startblock | endblock | volindex
> >>  
> >> ------------+-------+---------+------------+-----------+-----------+---------+------------+----------+----------
> >>        12953 |    38 |       7 |          0 |         0 |         0 |      
> >>  0 |          0 |        0 |        1
> >>        12956 |    38 |       8 |          1 |        37 |         0 |      
> >>  0 |          1 |    14325 |        2
> >>        12957 |    38 |       8 |         37 |        50 |         1 |      
> >>  1 |          1 |    15499 |        3
> >> ....
> >>
> >> where mediaid 7 is the tape it just recycled and didn't use, and mediaid 8 
> >> is the tape it's sharing with the other job.
> >>
> >> I'm happy for it to interleave, but I'm curious as to why it touches that 
> >> first tape (7) but then decides not to use it. 
> >>    It's effectively recycling/nuking a backup that doesn't need to be, 
> >> requiring two tapes where one will do.   Anyone 
> >> got any idea what's going on?  Happy to file a bug if it is one, but don't 
> >> want to false report.
> > 
> > I suggest trying it in bacula 5.0.1 first.
> Is that because you have some concrete knowledge of a fix that is in 5.0.1, 
> or just a blind "upgrade to the latest" 
> suggestion?  It's a non-trivial amount of work for something that might just 
> be speculation.

A bit of both.

There was some concurrency issue like this, but I can't remember which version
it was.

In general, it is best to upgrade before reporting a bug.


> > If that still fails, what does it report in the log say when it decides not 
> > to
> > use the first tape?
> Well, still on 5.0.0 for now, in case this sheds any light on the matter, 
> here's the logs (latest at the top) from the 
> start of the jobs (at 21:00:00 up to where it starts actually reading/writing 
> data.  000006L4 is the volume that gets 
> recycled but not used, 000013L4 is the one that is used.   I can't quite 
> figure out why it doesn't write anything to 
> 6L4; it's loaded and ready nearly 2 minutes before 13L4 is loaded, so it 
> should have had something written to 5L4 by then.
> 
>     948 |    41 | 2010-03-30 21:06:06 | ausv01-fd JobId 41:      /tmp is a 
> different filesystem. Will not descend from / into /tmp
>     947 |    41 | 2010-03-30 21:06:06 | ausv01-dir JobId 41: Max Volume jobs 
> exceeded. Marking Volume "000013L4" as Used.
>     946 |    41 | 2010-03-30 21:06:06 | ausv01-sd JobId 41: Recycled volume 
> "000013L4" on device "Drive-1" (/dev/nst0), all previous data lost.
>     945 |    41 | 2010-03-30 21:06:03 | ausv01-sd JobId 41: 3305 Autochanger 
> "load slot 12, drive 0", status is OK.
>     944 |    41 | 2010-03-30 21:05:27 | ausv01-sd JobId 41: 3304 Issuing 
> autochanger "load slot 12, drive 0" command.
>     943 |    41 | 2010-03-30 21:04:42 | ausv01-sd JobId 41: 3307 Issuing 
> autochanger "unload slot 5, drive 0" command.
>     942 |    41 | 2010-03-30 21:04:42 | ausv01-dir JobId 41: Recycled volume 
> "000013L4"
>     941 |    41 | 2010-03-30 21:04:42 | ausv01-dir JobId 41: All records 
> pruned from Volume "000013L4"; marking it "Purged"
>     940 |    41 | 2010-03-30 21:04:42 | ausv01-dir JobId 41: There are no 
> more Jobs associated with Volume "000013L4". Marking it purged.
>     939 |    40 | 2010-03-30 21:04:09 | ausv01-dir JobId 40: Max Volume jobs 
> exceeded. Marking Volume "000006L4" as Used.
>     938 |    40 | 2010-03-30 21:04:09 | ausv01-sd JobId 40: Recycled volume 
> "000006L4" on device "Drive-1" (/dev/nst0), all previous data lost.
>     937 |    40 | 2010-03-30 21:04:07 | ausv01-sd JobId 40: 3305 Autochanger 
> "load slot 5, drive 0", status is OK.
>     936 |    40 | 2010-03-30 21:03:33 | ausv01-sd JobId 40: 3304 Issuing 
> autochanger "load slot 5, drive 0" command.
>     935 |    40 | 2010-03-30 21:01:53 | ausv01-sd JobId 40: 3307 Issuing 
> autochanger "unload slot 1, drive 0" command.
>     934 |    41 | 2010-03-30 21:00:14 | ausv01-dir JobId 41: Sending Accurate 
> information.
>     933 |    41 | 2010-03-30 21:00:13 | ausv01-dir JobId 41: Using Device 
> "Drive-1"
>     932 |    41 | 2010-03-30 21:00:08 | ausv01-dir JobId 41: Start Backup 
> JobId 41, Job=ausv01-other.2010-03-30_21.00.00_29
>     931 |    40 | 2010-03-30 21:00:00 | ausv01-dir JobId 40: Sending Accurate 
> information.
>     930 |    40 | 2010-03-30 21:00:00 | ausv01-dir JobId 40: Using Device 
> "Drive-1"
>     929 |    40 | 2010-03-30 21:00:00 | ausv01-dir JobId 40: Recycled volume 
> "000006L4"
>     928 |    40 | 2010-03-30 21:00:00 | ausv01-dir JobId 40: All records 
> pruned from Volume "000006L4"; marking it "Purged"
>     927 |    40 | 2010-03-30 21:00:00 | ausv01-dir JobId 40: There are no 
> more Jobs associated with Volume "000006L4". Marking it purged.
>     926 |    40 | 2010-03-30 21:00:00 | ausv01-dir JobId 40: Start Backup 
> JobId 40, Job=ausv01-gd01.2010-03-30_21.00.00_28
> 
> I'm going to try to work around this by setting Maximum Concurrent Jobs = 1 
> on each tape drive in the changer, and see 
> what works.  If I avoid the interleaving, I should avoid the unnecessary 
> recycling.   It's also possible that I'd get 
> some results by starting one of the jobs 15 minutes later than the other, by 
> which time it'll already have a tape ready 
> and writing for the first job, so it'll use that one.
> 
> I'd still be keen on some sort of understanding of what's going on in the 
> first place though; my lack of understanding 
> disturbs me.

What "Max Volume jobs" is set for 000006L4?  It seems to think this has been
exceeded immediately after it was recycled.

__Martin

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users