Bacula-users

Re: [Bacula-users] Tuning for large (millions of files) backups?

2010-11-12 06:11:01
Subject: Re: [Bacula-users] Tuning for large (millions of files) backups?
From: Gavin McCullagh <gavin.mccullagh AT gcd DOT ie>
To: Mikael Fridh <frimik AT gmail DOT com>
Date: Fri, 12 Nov 2010 11:08:14 +0000
Hi,

On Fri, 12 Nov 2010, Mikael Fridh wrote:

> On Thu, Nov 11, 2010 at 3:47 PM, Gavin McCullagh <gavin.mccullagh AT gcd DOT 
> ie> wrote:

> > # Time: 101111 14:24:49
> > # User@Host: bacula[bacula] @ localhost []
> > # Query_time: 1139.657646  Lock_time: 0.000471 Rows_sent: 4263403 
> >  Rows_examined: 50351037
> > SET timestamp=1289485489;
> > SELECT Path.Path, Filename.Name, Temp.FileIndex, Temp.JobId, LStat, MD5 
> > FROM ( SELECT FileId, Job.JobId AS JobId, FileIndex, File.PathId AS PathId, 
> > File.FilenameId AS FilenameId, LStat, MD5 FROM Job, File, ( SELECT 
> > MAX(JobTDate) AS JobTDate, PathId, FilenameId FROM ( SELECT JobTDate, 
> > PathId, FilenameId FROM File JOIN Job USING (JobId) WHERE File.JobId IN 
> > (9944,9950,9973,9996) UNION ALL SELECT JobTDate, PathId, FilenameId FROM 
> > BaseFiles JOIN File USING (FileId) JOIN Job  ON    (BaseJobId = Job.JobId) 
> > WHERE BaseFiles.JobId IN (9944,9950,9973,9996) ) AS tmp GROUP BY PathId, 
> > FilenameId ) AS T1 WHERE (Job.JobId IN ( SELECT DISTINCT BaseJobId FROM 
> > BaseFiles WHERE JobId IN (9944,9950,9973,9996)) OR Job.JobId IN 
> > (9944,9950,9973,9996)) AND T1.JobTDate = Job.JobTDate AND Job.JobId = 
> > File.JobId AND T1.PathId = File.PathId AND T1.FilenameId = File.FilenameId 
> > ) AS Temp JOIN Filename ON (Filename.FilenameId = Temp.FilenameId) JOIN 
> > Path ON (Path.PathId = Temp.PathId) WHERE FileIndex > 0 ORDER BY 
> > Temp.JobId, FileIndex 
> ASC;
> 
> Could you please do an EXPLAIN on this query?

I prefixed the query by the word EXPLAIN and ran it:

mysql> source bacularestorequery.sql
+----+--------------------+------------+--------+-------------------------------------+------------+---------+-------------------------+---------+---------------------------------+
| id | select_type        | table      | type   | possible_keys                 
      | key        | key_len | ref                     | rows    | Extra        
                   |
+----+--------------------+------------+--------+-------------------------------------+------------+---------+-------------------------+---------+---------------------------------+
|  1 | PRIMARY            | <derived2> | ALL    | NULL                          
      | NULL       | NULL    | NULL                    | 4277605 | Using where; 
Using filesort     |
|  1 | PRIMARY            | Filename   | eq_ref | PRIMARY                       
      | PRIMARY    | 4       | Temp.FilenameId         |       1 |              
                   |
|  1 | PRIMARY            | Path       | eq_ref | PRIMARY                       
      | PRIMARY    | 4       | Temp.PathId             |       1 |              
                   |
|  2 | DERIVED            | <derived3> | ALL    | NULL                          
      | NULL       | NULL    | NULL                    | 4277605 |              
                   |
|  2 | DERIVED            | File       | ref    | 
PathId,FilenameId,JobId,jobid_index | FilenameId | 8       | 
T1.FilenameId,T1.PathId |       4 | Using where                     |
|  2 | DERIVED            | Job        | eq_ref | PRIMARY                       
      | PRIMARY    | 4       | bacula.File.JobId       |       1 | Using where  
                   |
|  6 | DEPENDENT SUBQUERY | NULL       | NULL   | NULL                          
      | NULL       | NULL    | NULL                    |    NULL | no matching 
row in const table  |
|  3 | DERIVED            | <derived4> | ALL    | NULL                          
      | NULL       | NULL    | NULL                    | 4302683 | Using 
temporary; Using filesort |
|  4 | DERIVED            | Job        | range  | PRIMARY                       
      | PRIMARY    | 4       | NULL                    |       4 | Using where  
                   |
|  4 | DERIVED            | File       | ref    | JobId,jobid_index             
      | JobId      | 4       | bacula.Job.JobId        |   41816 | Using index  
                   |
|  5 | UNION              | NULL       | NULL   | NULL                          
      | NULL       | NULL    | NULL                    |    NULL | no matching 
row in const table  |
| NULL | UNION RESULT       | <union4,5> | ALL    | NULL                        
        | NULL       | NULL    | NULL                    |    NULL |            
                     |
+----+--------------------+------------+--------+-------------------------------------+------------+---------+-------------------------+---------+---------------------------------+
12 rows in set (16 min 15.79 sec)

I presume that's what you're looking for?

> Tuning's not going to get any of those 50 million traversed rows
> disappear. Only a differently optimized query plan will.

Well, if the above helps and/or if you'd like me to run an alternative proposed
query I'm happy to.  I must confess it would take me quite a few hours to
actually understand that query.

Gavin



------------------------------------------------------------------------------
Centralized Desktop Delivery: Dell and VMware Reference Architecture
Simplifying enterprise desktop deployment and management using
Dell EqualLogic storage and VMware View: A highly scalable, end-to-end
client virtualization framework. Read more!
http://p.sf.net/sfu/dell-eql-dev2dev
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>