>>>>> On Wed, 31 Mar 2010 11:29:21 +1300, Craig Miskell said:
>
> Martin Simmons wrote:
> >>>>>> On Tue, 30 Mar 2010 12:09:47 +1300, Craig Miskell said:
> >> Hi,
> >> I have a TL-2000 tape changer with two drives. I have two differential
> >> jobs which start at the same time, on different
> >> filesystems but the same client (also happens to be the same server as
> >> the director/storage daemon). This is bacula 5.0.0
> >>
> >> Both jobs interleave to the same tape (this is good IMO, although I don't
> >> know which knob I tweaked to make that happen
> >> :)), but one of the jobs first seems to claim a second tape, purge/recycle
> >> it, log the following rows in jobmedia:
> >> jobmediaid | jobid | mediaid | firstindex | lastindex | startfile |
> >> endfile | startblock | endblock | volindex
> >>
> >> ------------+-------+---------+------------+-----------+-----------+---------+------------+----------+----------
> >> 12953 | 38 | 7 | 0 | 0 | 0 |
> >> 0 | 0 | 0 | 1
> >> 12956 | 38 | 8 | 1 | 37 | 0 |
> >> 0 | 1 | 14325 | 2
> >> 12957 | 38 | 8 | 37 | 50 | 1 |
> >> 1 | 1 | 15499 | 3
> >> ....
> >>
> >> where mediaid 7 is the tape it just recycled and didn't use, and mediaid 8
> >> is the tape it's sharing with the other job.
> >>
> >> I'm happy for it to interleave, but I'm curious as to why it touches that
> >> first tape (7) but then decides not to use it.
> >> It's effectively recycling/nuking a backup that doesn't need to be,
> >> requiring two tapes where one will do. Anyone
> >> got any idea what's going on? Happy to file a bug if it is one, but don't
> >> want to false report.
> >
> > I suggest trying it in bacula 5.0.1 first.
> Is that because you have some concrete knowledge of a fix that is in 5.0.1,
> or just a blind "upgrade to the latest"
> suggestion? It's a non-trivial amount of work for something that might just
> be speculation.
A bit of both.
There was some concurrency issue like this, but I can't remember which version
it was.
In general, it is best to upgrade before reporting a bug.
> > If that still fails, what does it report in the log say when it decides not
> > to
> > use the first tape?
> Well, still on 5.0.0 for now, in case this sheds any light on the matter,
> here's the logs (latest at the top) from the
> start of the jobs (at 21:00:00 up to where it starts actually reading/writing
> data. 000006L4 is the volume that gets
> recycled but not used, 000013L4 is the one that is used. I can't quite
> figure out why it doesn't write anything to
> 6L4; it's loaded and ready nearly 2 minutes before 13L4 is loaded, so it
> should have had something written to 5L4 by then.
>
> 948 | 41 | 2010-03-30 21:06:06 | ausv01-fd JobId 41: /tmp is a
> different filesystem. Will not descend from / into /tmp
> 947 | 41 | 2010-03-30 21:06:06 | ausv01-dir JobId 41: Max Volume jobs
> exceeded. Marking Volume "000013L4" as Used.
> 946 | 41 | 2010-03-30 21:06:06 | ausv01-sd JobId 41: Recycled volume
> "000013L4" on device "Drive-1" (/dev/nst0), all previous data lost.
> 945 | 41 | 2010-03-30 21:06:03 | ausv01-sd JobId 41: 3305 Autochanger
> "load slot 12, drive 0", status is OK.
> 944 | 41 | 2010-03-30 21:05:27 | ausv01-sd JobId 41: 3304 Issuing
> autochanger "load slot 12, drive 0" command.
> 943 | 41 | 2010-03-30 21:04:42 | ausv01-sd JobId 41: 3307 Issuing
> autochanger "unload slot 5, drive 0" command.
> 942 | 41 | 2010-03-30 21:04:42 | ausv01-dir JobId 41: Recycled volume
> "000013L4"
> 941 | 41 | 2010-03-30 21:04:42 | ausv01-dir JobId 41: All records
> pruned from Volume "000013L4"; marking it "Purged"
> 940 | 41 | 2010-03-30 21:04:42 | ausv01-dir JobId 41: There are no
> more Jobs associated with Volume "000013L4". Marking it purged.
> 939 | 40 | 2010-03-30 21:04:09 | ausv01-dir JobId 40: Max Volume jobs
> exceeded. Marking Volume "000006L4" as Used.
> 938 | 40 | 2010-03-30 21:04:09 | ausv01-sd JobId 40: Recycled volume
> "000006L4" on device "Drive-1" (/dev/nst0), all previous data lost.
> 937 | 40 | 2010-03-30 21:04:07 | ausv01-sd JobId 40: 3305 Autochanger
> "load slot 5, drive 0", status is OK.
> 936 | 40 | 2010-03-30 21:03:33 | ausv01-sd JobId 40: 3304 Issuing
> autochanger "load slot 5, drive 0" command.
> 935 | 40 | 2010-03-30 21:01:53 | ausv01-sd JobId 40: 3307 Issuing
> autochanger "unload slot 1, drive 0" command.
> 934 | 41 | 2010-03-30 21:00:14 | ausv01-dir JobId 41: Sending Accurate
> information.
> 933 | 41 | 2010-03-30 21:00:13 | ausv01-dir JobId 41: Using Device
> "Drive-1"
> 932 | 41 | 2010-03-30 21:00:08 | ausv01-dir JobId 41: Start Backup
> JobId 41, Job=ausv01-other.2010-03-30_21.00.00_29
> 931 | 40 | 2010-03-30 21:00:00 | ausv01-dir JobId 40: Sending Accurate
> information.
> 930 | 40 | 2010-03-30 21:00:00 | ausv01-dir JobId 40: Using Device
> "Drive-1"
> 929 | 40 | 2010-03-30 21:00:00 | ausv01-dir JobId 40: Recycled volume
> "000006L4"
> 928 | 40 | 2010-03-30 21:00:00 | ausv01-dir JobId 40: All records
> pruned from Volume "000006L4"; marking it "Purged"
> 927 | 40 | 2010-03-30 21:00:00 | ausv01-dir JobId 40: There are no
> more Jobs associated with Volume "000006L4". Marking it purged.
> 926 | 40 | 2010-03-30 21:00:00 | ausv01-dir JobId 40: Start Backup
> JobId 40, Job=ausv01-gd01.2010-03-30_21.00.00_28
>
> I'm going to try to work around this by setting Maximum Concurrent Jobs = 1
> on each tape drive in the changer, and see
> what works. If I avoid the interleaving, I should avoid the unnecessary
> recycling. It's also possible that I'd get
> some results by starting one of the jobs 15 minutes later than the other, by
> which time it'll already have a tape ready
> and writing for the first job, so it'll use that one.
>
> I'd still be keen on some sort of understanding of what's going on in the
> first place though; my lack of understanding
> disturbs me.
What "Max Volume jobs" is set for 000006L4? It seems to think this has been
exceeded immediately after it was recycled.
__Martin
------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
|