Re: [Bacula-users] Performance with MySQL queries since 3.0.0 (Dir inser

> 
> James Harper wrote:
> > The subquery returns a very small result set (0 or 1, assuming you
use
> > DISTINCT) and so isn't too inefficient. It's when you say 'WHERE NOT
> > EXISTS (SOME QUERY WITH LOTS OF RESULTS)' that you start to really
bog
> > down
> 
> True, but if the outer query contains a very large number of records,
the
> inner query is executed over and over and over and ...
> 
> This multiplies the small execution time by a large number.
> 
> Think of it like a WHILE loop that executes a block of code each time
> round. The inner block of code might be quick, but if the loop goes
round
> several million times, the cumulative time becomes significant.
> 
> One general rule of thumb of SQL execution is that bulk processing is
> almost always faster than processing records individually in a loop.

I would have thought that the query planner would have reduced both
solutions to the same thing (eg for every record in the outer or left
part, match it up with the inner or left part). I agree that the JOIN
lends itself better to optimization though, and that's certainly the way
I would have written it as it is far clearer to read and scales to far
more situations than the 'NOT EXISTS' solution does.

I constructed a simple test scenario - created a table 'james' with a
few hundred thousand rows in it of both real filenames and corrupted
versions of real filenames, and ran the following tests:

RESET QUERY CACHE; (can't remember if that was the exact syntax)
SELECT COUNT(DISTINCT Name)
FROM james AS a
WHERE NOT EXISTS
(
    SELECT Name
    FROM Filename AS f
    WHERE f.Name = a.Name
);

And

RESET QUERY CACHE;
SELECT COUNT(DISTINCT Name)
FROM james AS a
LEFT JOIN filename AS f ON a.name = f.name
WHERE f.name IS NULL;

And compared the times over a half a dozen runs. There wasn't a
measurable difference between the results, but the server was busy
enough at the time that a significant amount of 'noise' was added in -
both queries ran for between 2 and 6 seconds. The simple timing test
also completely fails to measure memory and io usage.

I'm sure someone could construct some more precise tests that would
eliminate any speculation, but on face value you aren't going to get an
order-of-magnitude improvement or anything close just by tinkering with
this query, unless you can identify some indexes that could be added
onto the batch table (which is temporary isn't it?). Although given that
Filename already has an index on the only field being used, I suspect
that even that won't do much for you.

Does MySQL have a 'profiler' tool like MSSQL does? I spend most of my
time in MSSQL and if some operation is running slow I just attach the
profiler to it and capture the queries and focus on the ones that are
taking most of the time.

James


------------------------------------------------------------------------------
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Performance with MySQL queries since 3.0.0 (Dir inserting attributes hang)