Bacula-users

Re: [Bacula-users] VirtualFull mysql query blocks other jobs for a long time

2010-04-07 11:24:34
Subject: Re: [Bacula-users] VirtualFull mysql query blocks other jobs for a long time
From: ebollengier <eric AT eb.homelinux DOT org>
To: bacula-users AT lists.sourceforge DOT net
Date: Wed, 7 Apr 2010 08:22:09 -0700 (PDT)

ebollengier wrote:
> 
> 
> 
> Graham Keeling wrote:
>> 
>> On Tue, Apr 06, 2010 at 09:01:13AM -0700, ebollengier wrote:
>>> Hello Graham,
>> 
>> Hello, thanks for your reply.
>> 
>>> Graham Keeling wrote:
>>> > 
>>> > Is there anything I can do that would speed it up?
>>> > Perhaps even more importantly, what was it doing for the 33 hour
>>> period
>>> > between Apr 4 18:56:02 and Apr 6 03:56:12? Was it just writing the
>>> > bootstrap
>>> > records?
>>> > 
>>> > On the Path table, I have these:
>>> >    PRIMARY KEY(PathId),
>>> >    INDEX (Path(255))
>>> > 
>>> > On the File table, I have these:
>>> >    PRIMARY KEY(FileId),
>>> >    INDEX (JobId, PathId, FilenameId),
>>> >    INDEX (PathId),
>>> >    INDEX (FilenameId)
>>> > 
>>> > On the Job table, I have these:
>>> >    PRIMARY KEY(JobId),
>>> >    INDEX (Name(128)),
>>> >    INDEX (ClientId)
>>> > 
>>> 
>>> Did you run tests with the original schema? It looks like you removed
>>> important index such as the one
>>> on JobId and you added others, and i'm not sure that MySQL can play
>>> properly
>>> with composed indexes (it doesn't looks like) in this treatment.
>> 
>> In future, to avoid generating this sort of complaint, I will not fiddle
>> with
>> the default bacula indexes.
>> However, I think that the problem is not to do with this.
>> I shall explain below...
>> 
>>> I've made tests with 400,000,000 files and 40,000,000 filenames over 600
>>> jobs, and it doesn't run fast as PostgreSQL (factor 6 between both
>>> engines),
>>> but it should handle the job rather quickly. (on my pc, for 2M files,
>>> it's
>>> between 1min and 1min30 (15seconds with PostgreSQL))
>> 
>> 
>> Here is the command that mysql takes hours over, and hence the one that I
>> am
>> using to test. In all of my tests, the result is the same whether I am
>> using
>> the default bacula indexes or my additional ones.
>> 
>> 
>> SELECT Path.Path, Filename.Name, Temp.FileIndex, Temp.JobId, LStat, MD5
>> FROM
>>    ( SELECT FileId, Job.JobId AS JobId, FileIndex, File.PathId AS PathId,
>>         File.FilenameId AS FilenameId, LStat, MD5
>>      FROM Job, File,
>>             ( SELECT MAX(JobTDate) AS JobTDate, PathId, FilenameId
>>               FROM
>>                   ( SELECT JobTDate, PathId, FilenameId
>>                     FROM File
>>                     JOIN Job USING (JobId)
>>                     WHERE File.JobId IN (22,23,31,34,42,48,52)
>>                     UNION ALL
>>                     SELECT JobTDate, PathId, FilenameId
>>                     FROM BaseFiles
>>                     JOIN File USING (FileId)
>>                     JOIN Job  ON    (BaseJobId = Job.JobId)
>>                     WHERE BaseFiles.JobId IN (22,23,31,34,42,48,52)
>>                   )
>>               AS tmp GROUP BY PathId, FilenameId
>>             ) AS T1
>>      WHERE (Job.JobId IN
>>                 ( SELECT DISTINCT BaseJobId
>>                     FROM BaseFiles
>>                     WHERE JobId IN (22,23,31,34,42,48,52)
>>                 )
>>            )
>>      OR Job.JobId IN (22,23,31,34,42,48,52)
>>      AND T1.JobTDate = Job.JobTDate
>>      AND Job.JobId = File.JobId
>>      AND T1.PathId = File.PathId
>>      AND T1.FilenameId = File.FilenameId
>>    ) AS Temp
>> JOIN Filename ON (Filename.FilenameId = Temp.FilenameId)
>> JOIN Path ON (Path.PathId = Temp.PathId)
>> WHERE FileIndex > 0
>> ORDER BY Temp.JobId, FileIndex ASC;
>> 
>> 
>> 
>> Since I am not using base jobs, I did the experiment of removing the
>> lines
>> from the query to do with base jobs. When I did that, the query changed
>> from
>> taking hours, to taking under a second.
>> 
>> I then tracked it down to this specific part (i.e, remove this, and get a
>> truly magnificent speed-up):
>> 
>>            (Job.JobId IN
>>                 ( SELECT DISTINCT BaseJobId
>>                     FROM BaseFiles
>>                     WHERE JobId IN (22,23,31,34,42,48,52)
>>                 )
>>            )
>>      OR
>> 
>> I think that mysql is repeatedly calling this section for each row that
>> the SELECT returns.
>> 
>> Now the question is what to do about it?
>> 
> 
> It's very interesting, in my tests (with and without basejobs), it changes
> nothing... But i will
> check it again. (MySQL have a query cache, and if you run two times the
> same thing, the result
> is very fast)
> 
> As we are using JobTDate to find the last record of each file, we use it
> to get JobId back from the
> Job table (yes, ugly, but MySQL doesn't have DISTINCT ON()), and if two
> jobs have the same JobTDate, we are in trouble. So, i limit the selection
> to this JobId list.
> 
> Hope that you found the problem, it's rather easy to compute this list
> once, and
> use it after.
> 
> Bye
> 

I tweaked my test to compare both queries, and it shows no difference with
and without base job part... If you want to test queries on your side with
your data, you can download my tool (accurate-test.pl) on
http://bacula.git.sourceforge.net/git/gitweb.cgi?p=bacula/docs;a=tree;f=docs/techlogs;hb=HEAD

If you can tweak my script to reproduce your problem, i would be able to fix
things.

http://old.nabble.com/file/p28166612/diff-with-without-basejob.png 

Bye

-- 
View this message in context: 
http://old.nabble.com/VirtualFull-mysql-query-blocks-other-jobs-for-a-long-time-tp28149748p28166612.html
Sent from the Bacula - Users mailing list archive at Nabble.com.


------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>