Bacula-users

Re: [Bacula-users] Tuning for large (millions of files) backups?

2010-11-12 10:24:05
Subject: Re: [Bacula-users] Tuning for large (millions of files) backups?
From: Bob Hetzel <beh AT case DOT edu>
To: bacula-users AT lists.sourceforge DOT net
Date: Fri, 12 Nov 2010 10:20:23 -0500

> From: Gavin McCullagh <gavin.mccullagh AT gcd DOT ie>
> Subject: Re: [Bacula-users] Tuning for large (millions of files)
>       backups?
> To: bacula-users AT lists.sourceforge DOT net
> Message-ID: <20101111144733.GZ20359 AT gcd DOT ie>
> Content-Type: text/plain; charset=us-ascii
>
> On Mon, 08 Nov 2010, Gavin McCullagh wrote:
>
>> > We seem to have the correct indexes on the file table.  I've run optimize 
>> > table
>> > and it still takes 14 minutes to build the tree on one of our bigger 
>> > clients.
>> > We have 51 million entries in the file table.
> I thought I should give some mroe concrete information:
>
> I don't suppose this is news to anyone but here's the mysql slow query log to
> correspond:
>
> # Time: 101111 14:24:49
> # User@Host: bacula[bacula] @ localhost []
> # Query_time: 1139.657646  Lock_time: 0.000471 Rows_sent: 4263403  
> Rows_examined: 50351037
> SET timestamp=1289485489;
> SELECT Path.Path, Filename.Name, Temp.FileIndex, Temp.JobId, LStat, MD5 FROM 
> ( SELECT FileId, Job.JobId AS JobId, FileIndex, File.PathId AS PathId, 
> File.FilenameId AS FilenameId, LStat, MD5 FROM Job, File, ( SELECT 
> MAX(JobTDate) AS JobTDate, PathId, FilenameId FROM ( SELECT JobTDate, PathId, 
> FilenameId FROM File JOIN Job USING (JobId) WHERE File.JobId IN 
> (9944,9950,9973,9996) UNION ALL SELECT JobTDate, PathId, FilenameId FROM 
> BaseFiles JOIN File USING (FileId) JOIN Job  ON    (BaseJobId = Job.JobId) 
> WHERE BaseFiles.JobId IN (9944,9950,9973,9996) ) AS tmp GROUP BY PathId, 
> FilenameId ) AS T1 WHERE (Job.JobId IN ( SELECT DISTINCT BaseJobId FROM 
> BaseFiles WHERE JobId IN (9944,9950,9973,9996)) OR Job.JobId IN 
> (9944,9950,9973,9996)) AND T1.JobTDate = Job.JobTDate AND Job.JobId = 
> File.JobId AND T1.PathId = File.PathId AND T1.FilenameId = File.FilenameId ) 
> AS Temp JOIN Filename ON (Filename.FilenameId = Temp.FilenameId) JOIN Path ON 
> (Path.PathId = Temp.PathId) WHERE FileIndex > 0 
ORDE!
> R BY Temp.JobId, FileIndex ASC;
>
>
> I've spent some time with the mysqltuner.pl script but to no avail thus far.
> There's 6GB RAM so it suggests a key buffer size of >4GB which I've set at
> 4.1GB.
>
> This is an Ubuntu Linux server running MySQL v5.1.41.  The mysql data is on an
> MD software RAID 1 array on 7200rpm SATA disks.  The tables are MyISAM (which 
> I
> had understood to be quicker than innodb in low concurrency situations?).  The
> tuner script is suggesting I should disable innodb as we're not using it which
> I will do though I wouldn't guess that will make a massive difference.
>
> There are no fragmented tables currently.
>
> Gavin
>

I'm starting to think the issue might be linked to some kernels or linux 
distros.  I have two bacula servers here.  One system is a year and a half 
old (12 GB RAM), has with a File table having approx 40 million File 
records.  That system has had the slowness issue (building the directory 
tree on restores took about an hour) running first Ubuntu 9.04 or 9.10 and 
now RedHat 6 beta.  The kernel currently is at 2.6.32-44.1.el6.x86_64.  I 
haven't tried downgrading, instead I tweaked the source code to use the old 
3.0.3 query and recompiled--I don't use Base jobs or Accurate backups so 
that's safe for me.

The other system is 4 yrs or so old, with less memory (8GB), slower cpus, 
slower hard drives, etc., and in fairness only 35 million File records. 
This one builds the directory tree in approx 10 seconds, but is running 
Centos 5.5.  The kernel currently is at 2.6.18-194.11.3.el5.

I'm still convinced that this one slow MySQL query could be changed to 
allow MySQL to better optimize it.  I started with the same my.cnf file 
settings and then tried tweaking them because the newer computer has more 
ram but that didn't help.

Is anybody up to the task of rewriting that query?

------------------------------------------------------------------------------
Centralized Desktop Delivery: Dell and VMware Reference Architecture
Simplifying enterprise desktop deployment and management using
Dell EqualLogic storage and VMware View: A highly scalable, end-to-end
client virtualization framework. Read more!
http://p.sf.net/sfu/dell-eql-dev2dev
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users