Bacula-users

Re: [Bacula-users] VirtualFull mysql query blocks other jobs for a long time

2010-04-09 06:16:29
Subject: Re: [Bacula-users] VirtualFull mysql query blocks other jobs for a long time
From: Graham Keeling <graham AT equiinet DOT com>
To: bacula-users AT lists.sourceforge DOT net
Date: Fri, 9 Apr 2010 11:19:47 +0100
On Thu, Apr 08, 2010 at 12:29:05PM -0700, ebollengier wrote:
> Graham Keeling wrote:
> > 
> > On Thu, Apr 08, 2010 at 07:44:14AM -0700, ebollengier wrote:
> >> 
> >> Hello,
> >> 
> >> 
> >> Graham Keeling wrote:
> >> > 
> >> > Hello,
> >> > 
> >> > I'm still waiting for my test database to fill up with Eric's data
> >> > (actually,
> >> > it's full now, but generating the right indexes is taking lots of
> >> time).
> >> > 
> >> > 
> >> > But, I have another proposed solution, better than the last one I made.
> >> > 
> >> > My previous solution was still taking a very very long time for a
> >> backup
> >> > of a
> >> > particular client that I had. Removing mention of BaseFiles did not
> >> help
> >> > for
> >> > this client.
> >> > 
> >> > However, the following did, and it doesn't break Base jobs.
> >> > 
> >> > Eric, I would appreciate it if you could give this a go on your test
> >> > machine.
> >> > 
> >> > It removes nasty the join on JobTDate by replacing it with a join on
> >> JobId
> >> > (which makes more sense, and is also an index on Job and File).
> >> > 
> >> 
> >> It's not possible, for example when you bscan a volume, or you run a
> >> copy,
> >> old
> >> records with old JobTDate have new FileIds or JobIds. You can't trust
> >> this
> >> field. It
> >> will work in many situations, but also will fail in an horrible way on
> >> many
> >> others...
> >>
> >> The first version (3.0.3) was using FileId, and it was quite good, but
> >> it could result wrong results on some cases, and for restore, you don't
> >> have
> >> choice, you
> >> need the exact one.
> > 
> > I don't understand this at all.
> > If you cannot trust the JobIds or FileIds in the File table, then the
> > postgres
> > query is also broken. The postgres query doesn't even mention JobTDate.
> > In fact, the postgres query is using StartTime to do the ordering.
> > 
> 
> And JobTDate is equivalent to StartTime (can be changed in PostgreSQL or in
> MySQL)

I still don't understand your original point.
If you cannot trust the JobIds or FileIds in the File table, then the postgres
query is also broken.

To clarify, can you tell me which fields in the File table you can trust?

> >> > This also means it can get rid of the outer WHERE 'BaseJobId' OR
> >> 'JobId'
> >> > that I
> >> > was complaining about before.
> >> > The correct JobId is chosen with MAX(JobTDate) by ordering by JobTDate
> >> > DESC on
> >> > the innermost select.
> >> > 
> >> > 
> >> 
> >> I would love that, it's what DISTINCT ON() on PostgreSQL does. But,
> >> unfortunately
> >> in SQL, you can only get fields that have a group function (like MAX,
> >> MIN,
> >> AVG) and
> >> fields present in GROUP BY. (we can also take FileIndex, MD5 and LStat at
> >> the same time)
> >> 
> >> from http://dev.mysql.com/doc/refman/5.0/en/group-by-hidden-columns.html
> >> > When using this feature, all rows in each group should have the same
> >> > values for the columns that are
> >> > ommitted from the GROUP BY part. The server is free to return any value
> >> > from the group, so the results 
> >> > are indeterminate unless all values are the same.
> >> 
> >> This is not only a MySQL gotcha, it's an SQL property. Unless i'm
> >> mistaken
> >> this paragraph, I'm sure
> >> that we can't use it :-(
> > 
> > OK, I understand what you are saying here.
> > 
> >> I appreciate your help, and I hope that we will find a solution for those
> >> that have
> >> this performance problem (that I can't reproduce myself).
> > 
> > I am probably out of ideas now, other than:
> > a) reverting to 3.0.3,
> > b) reverting to the 3.0.3 queries, or
> > c) switching to postgres (with all the horrible migration problems that
> > will
> > cause).
> > 
> 
> Even if we found a workaround for MySQL, Postgres will stay far more faster
> (For the 2M file query, postgres was about 12s, MySQL 3.0.3 at 60s and MySQL
> 5.0.1 at 90s)
> 
> Can you confirm that the BaseJob doesn't change your timing ?

I think that I must have made some sort of mistake when I was testing
that. I do not believe that it changes my timing enough to help with the
problem that I have.

I also think that there must be some kind of mistake with your test program
because the slowness of the query between the 3.0.3 query and the 5.0.1 query
is very, very massive on my real database.

I will attempt to prove this soon (I still haven't set up your test database
properly).


------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>