Bacula-users

Re: [Bacula-users] VirtualFull mysql query blocks other jobs for a long time

2010-04-07 07:38:44
Subject: Re: [Bacula-users] VirtualFull mysql query blocks other jobs for a long time
From: Graham Keeling <graham AT equiinet DOT com>
To: bacula-users AT lists.sourceforge DOT net
Date: Wed, 7 Apr 2010 12:40:24 +0100
On Tue, Apr 06, 2010 at 09:01:13AM -0700, ebollengier wrote:
> Hello Graham,

Hello, thanks for your reply.

> Graham Keeling wrote:
> > 
> > Hello,
> > I'm using bacula-5.0.1.
> > I have a 2.33GHz CPU with 2G of RAM.
> > I am using MySQL.
> > I had a VirtualFull scheduled for my client.
> > 
> > My log says the following:
> > 
> > Apr  4 18:56:02  Start Virtual Backup JobId 56,
> > Job=Linux:cvs.2010-04-04_18.56.00_03
> > Apr  4 18:56:02  This Job is not an Accurate backup so is not equivalent
> > to a Full backup.
> > Apr  6 03:56:12  Bootstrap records written to
> > /var/lib/bacula/backup.dev.equiinet.com-.restore.1.bsr
> > Apr  6 10:23:20  Ready to read from volume "backup-0002" on device "Tower
> > 2.0" (/write/mnt/Tower 2).
> > Apr  6 10:23:20  Labeled new Volume "backup-0046" on device "Tower 2.1"
> > (/write/mnt/Tower 2).
> > ...and now, backup-0046 is being written to.
> > 
> > At about Apr 6 10:00, I was logged on to the director, and attempting to
> > run
> > a 'status, director' command on bconsole.
> > bconsole printed a few bits of information, then paused. Presumably
> > because JobId 56 had locked the database. After 10:23:20, the command
> > printed
> > the rest of its information and gave me the prompt back.
> > 
> > At about Apr 6 10:00, I logged into mysql, and ran 'show full
> > processlist;',
> > which gave the following.
> > 
> > | Id   | User | Host      | db     | Command | Time  | State     | Info
> > | 3032 | root | localhost | bacula | Query   | 22464 | Locked    | UPDATE
> > Job SE
> > T JobStatus='R',Level='F',StartTime='2010-04-06
> > 03:56:14',ClientId=2,JobTDate=12
> > 70522574,PoolId=7,FileSetId=2 WHERE JobId=56                                
> >     
> > | 3033 | root | localhost | bacula | Query   | 22464 | executing | SELECT
> > Path.P
> > ath, Filename.Name, Temp.FileIndex, Temp.JobId, LStat, MD5 FROM ( SELECT
> > FileId,
> >  Job.JobId AS JobId, FileIndex, File.PathId AS PathId, File.FilenameId AS
> > Filena
> > meId, LStat, MD5 FROM Job, File, ( SELECT MAX(JobTDate) AS JobTDate,
> > PathId, Fil
> > enameId FROM ( SELECT JobTDate, PathId, FilenameId FROM File JOIN Job
> > USING (Job
> > Id) WHERE File.JobId IN (22,23,31,34,42,48,52) UNION ALL SELECT JobTDate,
> > PathId
> > , FilenameId FROM BaseFiles JOIN File USING (FileId) JOIN Job  ON   
> > (BaseJobId 
> > = Job.JobId) WHERE BaseFiles.JobId IN (22,23,31,34,42,48,52) ) AS tmp
> > GROUP BY P
> > athId, FilenameId ) AS T1 WHERE (Job.JobId IN ( SELECT DISTINCT BaseJobId
> > FROM B
> > aseFiles WHERE JobId IN (22,23,31,34,42,48,52)) OR Job.JobId IN
> > (22,23,31,34,42,
> > 48,52)) AND T1.JobTDate = Job.JobTDate AND Job.JobId = File.JobId AND
> > T1.PathId 
> > = File.PathId AND T1.FilenameId = File.FilenameId ) AS Temp JOIN Filename
> > ON (Fi
> > lename.FilenameId = Temp.FilenameId) JOIN Path ON (Path.PathId =
> > Temp.PathId) WH
> > ERE FileIndex > 0 ORDER BY Temp.JobId, FileIndex ASC | 
> > 
> > 'Time 22464' is equivalent to about 6 hours, so I presume that after
> > 'Bootstrap
> > records written', bacula has been waiting for this sql command, during
> > which
> > time it could not do anything else. Backups of other clients were failing
> > because I had Max Wait Time set to something less than 6 hours.
> > 
> > The following is a listing of the number of files that each previous
> > backup
> > said it had backed up, which doesn't seem extraordinarily large to my
> > mind.
> > 
> > Full: 810,400
> > Incr:  52,487
> > Incr:  52,485
> > Incr:  52,485
> > Incr: 332,649
> > Incr:  52,492
> > Incr:  52,593
> > Incr: 332,677
> > Incr:  52,497
> > Incr:  52,497
> > Incr:  52,497
> > 
> > 
> > So, my questions are:
> > 
> > Does it really need to take so long?
> > Is it expected that this Job blocks everything else for 6 hours?
> > 
> 
> I think that this part of the code doesn't use a dedicated connection, it
> would be nice to
> improve this area. (other part of the code are using a specific connection
> to do this work)
> 
> 
> Graham Keeling wrote:
> > 
> > Is there anything I can do that would speed it up?
> > Perhaps even more importantly, what was it doing for the 33 hour period
> > between Apr 4 18:56:02 and Apr 6 03:56:12? Was it just writing the
> > bootstrap
> > records?
> > 
> > On the Path table, I have these:
> >    PRIMARY KEY(PathId),
> >    INDEX (Path(255))
> > 
> > On the File table, I have these:
> >    PRIMARY KEY(FileId),
> >    INDEX (JobId, PathId, FilenameId),
> >    INDEX (PathId),
> >    INDEX (FilenameId)
> > 
> > On the Job table, I have these:
> >    PRIMARY KEY(JobId),
> >    INDEX (Name(128)),
> >    INDEX (ClientId)
> > 
> 
> Did you run tests with the original schema? It looks like you removed
> important index such as the one
> on JobId and you added others, and i'm not sure that MySQL can play properly
> with composed indexes (it doesn't looks like) in this treatment.

In future, to avoid generating this sort of complaint, I will not fiddle with
the default bacula indexes.
However, I think that the problem is not to do with this.
I shall explain below...

> I've made tests with 400,000,000 files and 40,000,000 filenames over 600
> jobs, and it doesn't run fast as PostgreSQL (factor 6 between both engines),
> but it should handle the job rather quickly. (on my pc, for 2M files, it's
> between 1min and 1min30 (15seconds with PostgreSQL))




Here is the command that mysql takes hours over, and hence the one that I am
using to test. In all of my tests, the result is the same whether I am using
the default bacula indexes or my additional ones.


SELECT Path.Path, Filename.Name, Temp.FileIndex, Temp.JobId, LStat, MD5
FROM
   ( SELECT FileId, Job.JobId AS JobId, FileIndex, File.PathId AS PathId,
        File.FilenameId AS FilenameId, LStat, MD5
     FROM Job, File,
            ( SELECT MAX(JobTDate) AS JobTDate, PathId, FilenameId
              FROM
                  ( SELECT JobTDate, PathId, FilenameId
                    FROM File
                    JOIN Job USING (JobId)
                    WHERE File.JobId IN (22,23,31,34,42,48,52)
                    UNION ALL
                    SELECT JobTDate, PathId, FilenameId
                    FROM BaseFiles
                    JOIN File USING (FileId)
                    JOIN Job  ON    (BaseJobId = Job.JobId)
                    WHERE BaseFiles.JobId IN (22,23,31,34,42,48,52)
                  )
              AS tmp GROUP BY PathId, FilenameId
            ) AS T1
     WHERE (Job.JobId IN
                ( SELECT DISTINCT BaseJobId
                    FROM BaseFiles
                    WHERE JobId IN (22,23,31,34,42,48,52)
                )
           )
     OR Job.JobId IN (22,23,31,34,42,48,52)
     AND T1.JobTDate = Job.JobTDate
     AND Job.JobId = File.JobId
     AND T1.PathId = File.PathId
     AND T1.FilenameId = File.FilenameId
   ) AS Temp
JOIN Filename ON (Filename.FilenameId = Temp.FilenameId)
JOIN Path ON (Path.PathId = Temp.PathId)
WHERE FileIndex > 0
ORDER BY Temp.JobId, FileIndex ASC;



Since I am not using base jobs, I did the experiment of removing the lines
from the query to do with base jobs. When I did that, the query changed from
taking hours, to taking under a second.

I then tracked it down to this specific part (i.e, remove this, and get a
truly magnificent speed-up):

           (Job.JobId IN
                ( SELECT DISTINCT BaseJobId
                    FROM BaseFiles
                    WHERE JobId IN (22,23,31,34,42,48,52)
                )
           )
     OR

I think that mysql is repeatedly calling this section for each row that
the SELECT returns.

Now the question is what to do about it?


------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users