Bacula-users

Re: [Bacula-users] VirtualFull mysql query blocks other jobs for a long time

2010-04-08 12:56:27
Subject: Re: [Bacula-users] VirtualFull mysql query blocks other jobs for a long time
From: Graham Keeling <graham AT equiinet DOT com>
To: bacula-users AT lists.sourceforge DOT net
Date: Thu, 8 Apr 2010 17:59:43 +0100
On Thu, Apr 08, 2010 at 07:44:14AM -0700, ebollengier wrote:
> 
> Hello,
> 
> 
> Graham Keeling wrote:
> > 
> > Hello,
> > 
> > I'm still waiting for my test database to fill up with Eric's data
> > (actually,
> > it's full now, but generating the right indexes is taking lots of time).
> > 
> > 
> > But, I have another proposed solution, better than the last one I made.
> > 
> > My previous solution was still taking a very very long time for a backup
> > of a
> > particular client that I had. Removing mention of BaseFiles did not help
> > for
> > this client.
> > 
> > However, the following did, and it doesn't break Base jobs.
> > 
> > Eric, I would appreciate it if you could give this a go on your test
> > machine.
> > 
> > It removes nasty the join on JobTDate by replacing it with a join on JobId
> > (which makes more sense, and is also an index on Job and File).
> > 
> 
> It's not possible, for example when you bscan a volume, or you run a copy,
> old
> records with old JobTDate have new FileIds or JobIds. You can't trust this
> field. It
> will work in many situations, but also will fail in an horrible way on many
> others...
>
> The first version (3.0.3) was using FileId, and it was quite good, but
> it could result wrong results on some cases, and for restore, you don't have
> choice, you
> need the exact one.

I don't understand this at all.
If you cannot trust the JobIds or FileIds in the File table, then the postgres
query is also broken. The postgres query doesn't even mention JobTDate.
In fact, the postgres query is using StartTime to do the ordering.

> > This also means it can get rid of the outer WHERE 'BaseJobId' OR 'JobId'
> > that I
> > was complaining about before.
> > The correct JobId is chosen with MAX(JobTDate) by ordering by JobTDate
> > DESC on
> > the innermost select.
> > 
> > 
> 
> I would love that, it's what DISTINCT ON() on PostgreSQL does. But,
> unfortunately
> in SQL, you can only get fields that have a group function (like MAX, MIN,
> AVG) and
> fields present in GROUP BY. (we can also take FileIndex, MD5 and LStat at
> the same time)
> 
> from http://dev.mysql.com/doc/refman/5.0/en/group-by-hidden-columns.html
> > When using this feature, all rows in each group should have the same
> > values for the columns that are
> > ommitted from the GROUP BY part. The server is free to return any value
> > from the group, so the results 
> > are indeterminate unless all values are the same.
> 
> This is not only a MySQL gotcha, it's an SQL property. Unless i'm mistaken
> this paragraph, I'm sure
> that we can't use it :-(

OK, I understand what you are saying here.

> I appreciate your help, and I hope that we will find a solution for those
> that have
> this performance problem (that I can't reproduce myself).

I am probably out of ideas now, other than:
a) reverting to 3.0.3,
b) reverting to the 3.0.3 queries, or
c) switching to postgres (with all the horrible migration problems that will
cause).


------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>