Bacula-users

Re: [Bacula-users] VirtualFull mysql query blocks other jobs for a long time

2010-04-07 08:03:17
Subject: Re: [Bacula-users] VirtualFull mysql query blocks other jobs for a long time
From: ebollengier <eric AT eb.homelinux DOT org>
To: bacula-users AT lists.sourceforge DOT net
Date: Wed, 7 Apr 2010 04:59:51 -0700 (PDT)


Graham Keeling wrote:
> 
> On Tue, Apr 06, 2010 at 09:01:13AM -0700, ebollengier wrote:
>> Hello Graham,
> 
> Hello, thanks for your reply.
> 
>> Graham Keeling wrote:
>> > 
>> > Hello,
>> > I'm using bacula-5.0.1.
>> > I have a 2.33GHz CPU with 2G of RAM.
>> > I am using MySQL.
>> > I had a VirtualFull scheduled for my client.
>> > 
>> > My log says the following:
>> > 
>> > Apr  4 18:56:02  Start Virtual Backup JobId 56,
>> > Job=Linux:cvs.2010-04-04_18.56.00_03
>> > Apr  4 18:56:02  This Job is not an Accurate backup so is not
>> equivalent
>> > to a Full backup.
>> > Apr  6 03:56:12  Bootstrap records written to
>> > /var/lib/bacula/backup.dev.equiinet.com-.restore.1.bsr
>> > Apr  6 10:23:20  Ready to read from volume "backup-0002" on device
>> "Tower
>> > 2.0" (/write/mnt/Tower 2).
>> > Apr  6 10:23:20  Labeled new Volume "backup-0046" on device "Tower 2.1"
>> > (/write/mnt/Tower 2).
>> > ...and now, backup-0046 is being written to.
>> > 
>> > At about Apr 6 10:00, I was logged on to the director, and attempting
>> to
>> > run
>> > a 'status, director' command on bconsole.
>> > bconsole printed a few bits of information, then paused. Presumably
>> > because JobId 56 had locked the database. After 10:23:20, the command
>> > printed
>> > the rest of its information and gave me the prompt back.
>> > 
>> > At about Apr 6 10:00, I logged into mysql, and ran 'show full
>> > processlist;',
>> > which gave the following.
>> > 
>> > | Id   | User | Host      | db     | Command | Time  | State     | Info
>> > | 3032 | root | localhost | bacula | Query   | 22464 | Locked    |
>> UPDATE
>> > Job SE
>> > T JobStatus='R',Level='F',StartTime='2010-04-06
>> > 03:56:14',ClientId=2,JobTDate=12
>> > 70522574,PoolId=7,FileSetId=2 WHERE JobId=56                               
>> >      
>> > | 3033 | root | localhost | bacula | Query   | 22464 | executing |
>> SELECT
>> > Path.P
>> > ath, Filename.Name, Temp.FileIndex, Temp.JobId, LStat, MD5 FROM (
>> SELECT
>> > FileId,
>> >  Job.JobId AS JobId, FileIndex, File.PathId AS PathId, File.FilenameId
>> AS
>> > Filena
>> > meId, LStat, MD5 FROM Job, File, ( SELECT MAX(JobTDate) AS JobTDate,
>> > PathId, Fil
>> > enameId FROM ( SELECT JobTDate, PathId, FilenameId FROM File JOIN Job
>> > USING (Job
>> > Id) WHERE File.JobId IN (22,23,31,34,42,48,52) UNION ALL SELECT
>> JobTDate,
>> > PathId
>> > , FilenameId FROM BaseFiles JOIN File USING (FileId) JOIN Job  ON   
>> > (BaseJobId 
>> > = Job.JobId) WHERE BaseFiles.JobId IN (22,23,31,34,42,48,52) ) AS tmp
>> > GROUP BY P
>> > athId, FilenameId ) AS T1 WHERE (Job.JobId IN ( SELECT DISTINCT
>> BaseJobId
>> > FROM B
>> > aseFiles WHERE JobId IN (22,23,31,34,42,48,52)) OR Job.JobId IN
>> > (22,23,31,34,42,
>> > 48,52)) AND T1.JobTDate = Job.JobTDate AND Job.JobId = File.JobId AND
>> > T1.PathId 
>> > = File.PathId AND T1.FilenameId = File.FilenameId ) AS Temp JOIN
>> Filename
>> > ON (Fi
>> > lename.FilenameId = Temp.FilenameId) JOIN Path ON (Path.PathId =
>> > Temp.PathId) WH
>> > ERE FileIndex > 0 ORDER BY Temp.JobId, FileIndex ASC | 
>> > 
>> > 'Time 22464' is equivalent to about 6 hours, so I presume that after
>> > 'Bootstrap
>> > records written', bacula has been waiting for this sql command, during
>> > which
>> > time it could not do anything else. Backups of other clients were
>> failing
>> > because I had Max Wait Time set to something less than 6 hours.
>> > 
>> > The following is a listing of the number of files that each previous
>> > backup
>> > said it had backed up, which doesn't seem extraordinarily large to my
>> > mind.
>> > 
>> > Full: 810,400
>> > Incr:  52,487
>> > Incr:  52,485
>> > Incr:  52,485
>> > Incr: 332,649
>> > Incr:  52,492
>> > Incr:  52,593
>> > Incr: 332,677
>> > Incr:  52,497
>> > Incr:  52,497
>> > Incr:  52,497
>> > 
>> > 
>> > So, my questions are:
>> > 
>> > Does it really need to take so long?
>> > Is it expected that this Job blocks everything else for 6 hours?
>> > 
>> 
>> I think that this part of the code doesn't use a dedicated connection, it
>> would be nice to
>> improve this area. (other part of the code are using a specific
>> connection
>> to do this work)
>> 
>> 
>> Graham Keeling wrote:
>> > 
>> > Is there anything I can do that would speed it up?
>> > Perhaps even more importantly, what was it doing for the 33 hour period
>> > between Apr 4 18:56:02 and Apr 6 03:56:12? Was it just writing the
>> > bootstrap
>> > records?
>> > 
>> > On the Path table, I have these:
>> >    PRIMARY KEY(PathId),
>> >    INDEX (Path(255))
>> > 
>> > On the File table, I have these:
>> >    PRIMARY KEY(FileId),
>> >    INDEX (JobId, PathId, FilenameId),
>> >    INDEX (PathId),
>> >    INDEX (FilenameId)
>> > 
>> > On the Job table, I have these:
>> >    PRIMARY KEY(JobId),
>> >    INDEX (Name(128)),
>> >    INDEX (ClientId)
>> > 
>> 
>> Did you run tests with the original schema? It looks like you removed
>> important index such as the one
>> on JobId and you added others, and i'm not sure that MySQL can play
>> properly
>> with composed indexes (it doesn't looks like) in this treatment.
> 
> In future, to avoid generating this sort of complaint, I will not fiddle
> with
> the default bacula indexes.
> However, I think that the problem is not to do with this.
> I shall explain below...
> 
>> I've made tests with 400,000,000 files and 40,000,000 filenames over 600
>> jobs, and it doesn't run fast as PostgreSQL (factor 6 between both
>> engines),
>> but it should handle the job rather quickly. (on my pc, for 2M files,
>> it's
>> between 1min and 1min30 (15seconds with PostgreSQL))
> 
> 
> 
> 
> Here is the command that mysql takes hours over, and hence the one that I
> am
> using to test. In all of my tests, the result is the same whether I am
> using
> the default bacula indexes or my additional ones.
> 
> 
> SELECT Path.Path, Filename.Name, Temp.FileIndex, Temp.JobId, LStat, MD5
> FROM
>    ( SELECT FileId, Job.JobId AS JobId, FileIndex, File.PathId AS PathId,
>         File.FilenameId AS FilenameId, LStat, MD5
>      FROM Job, File,
>             ( SELECT MAX(JobTDate) AS JobTDate, PathId, FilenameId
>               FROM
>                   ( SELECT JobTDate, PathId, FilenameId
>                     FROM File
>                     JOIN Job USING (JobId)
>                     WHERE File.JobId IN (22,23,31,34,42,48,52)
>                     UNION ALL
>                     SELECT JobTDate, PathId, FilenameId
>                     FROM BaseFiles
>                     JOIN File USING (FileId)
>                     JOIN Job  ON    (BaseJobId = Job.JobId)
>                     WHERE BaseFiles.JobId IN (22,23,31,34,42,48,52)
>                   )
>               AS tmp GROUP BY PathId, FilenameId
>             ) AS T1
>      WHERE (Job.JobId IN
>                 ( SELECT DISTINCT BaseJobId
>                     FROM BaseFiles
>                     WHERE JobId IN (22,23,31,34,42,48,52)
>                 )
>            )
>      OR Job.JobId IN (22,23,31,34,42,48,52)
>      AND T1.JobTDate = Job.JobTDate
>      AND Job.JobId = File.JobId
>      AND T1.PathId = File.PathId
>      AND T1.FilenameId = File.FilenameId
>    ) AS Temp
> JOIN Filename ON (Filename.FilenameId = Temp.FilenameId)
> JOIN Path ON (Path.PathId = Temp.PathId)
> WHERE FileIndex > 0
> ORDER BY Temp.JobId, FileIndex ASC;
> 
> 
> 
> Since I am not using base jobs, I did the experiment of removing the lines
> from the query to do with base jobs. When I did that, the query changed
> from
> taking hours, to taking under a second.
> 
> I then tracked it down to this specific part (i.e, remove this, and get a
> truly magnificent speed-up):
> 
>            (Job.JobId IN
>                 ( SELECT DISTINCT BaseJobId
>                     FROM BaseFiles
>                     WHERE JobId IN (22,23,31,34,42,48,52)
>                 )
>            )
>      OR
> 
> I think that mysql is repeatedly calling this section for each row that
> the SELECT returns.
> 
> Now the question is what to do about it?
> 

It's very interesting, in my tests (with and without basejobs), it changes
nothing... But i will
check it again. (MySQL have a query cache, and if you run two times the same
thing, the result
is very fast)

As we are using JobTDate to find the last record of each file, we use it to
get JobId back from the
Job table (yes, ugly, but MySQL doesn't have DISTINCT ON()), and if two jobs
have the same JobTDate, we are in trouble. So, i limit the selection to this
JobId list.

Hope that you found the problem, it's rather easy to compute this list once,
and
use it after.

Bye
-- 
View this message in context: 
http://old.nabble.com/VirtualFull-mysql-query-blocks-other-jobs-for-a-long-time-tp28149748p28163907.html
Sent from the Bacula - Users mailing list archive at Nabble.com.


------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>