Bacula-users

Re: [Bacula-users] Performance with MySQL queries since 3.0.0 (Dir inserting attributes hang)

2009-06-20 05:17:06
Subject: Re: [Bacula-users] Performance with MySQL queries since 3.0.0 (Dir inserting attributes hang)
From: Kern Sibbald <kern AT sibbald DOT com>
To: Tom Sommer <mail AT tomsommer DOT dk>
Date: Sat, 20 Jun 2009 11:14:59 +0200
On Saturday 20 June 2009 08:51:53 Tom Sommer wrote:
> Tom Sommer wrote:
> > Mike Holden wrote:
> >> Jari Fredriksson wrote:
> >>>> INSERT INTO Filename( Name )
> >>>> SELECT a.Name
> >>>> FROM (
> >>>>
> >>>> SELECT DISTINCT Name
> >>>> FROM batch
> >>>> ) AS a
> >>>> WHERE NOT
> >>>> EXISTS (
> >>>>
> >>>> SELECT Name
> >>>> FROM Filename AS f
> >>>> WHERE f.Name = a.Name
> >>>> )
> >>>
> >>> That looks silly.
> >>>
> >>> I would write it shorter as
> >>>
> >>> INSERT INTO Filename(Name)
> >>> SELECT DISTINCT Name
> >>> FROM batch AS a
> >>> WHERE NOT EXISTS
> >>> (
> >>>     SELECT Name
> >>>     FROM Filename AS f
> >>>     WHERE f.Name = a.Name
> >>> )
> >>
> >> You may also want to consider using a JOIN rather than a subquery with a
> >> NOT EXISTS, something like (untested and unvalidated!):
> >>
> >> INSERT INTO filename(name)
> >> SELECT DISTINCT name
> >> FROM batch AS a
> >> LEFT JOIN filename AS f USING (name)
> >> WHERE f.name IS NULL
> >>
> >> I know from experience as an Oracle DBA (my day job) that this can often
> >> produce far more efficient results.
> >>
> >> Of course, all options need testing for both speed of execution and
> >> resource usage, bearing in mind that data varies from one installation
> >> to another, and one size may not fit all!
> >
> > Good suggestions, sounds like there might be an overall performance
> > problem with the current query for batch-inserts with lots of data. I'm
> > a bit unsure if I dare test these queries on my current installation.
> >
> > I'll CC Kern on the thread, perhaps he has some insights.
>
> So this morning I had to kill the above query because it's been running
> for 24+ hours, preventing the new daily jobs from running.
> I think I'm going to try and disable batch-inserts, the current
> situation is simply not good enough, it's become a major headache to run
> backups suddenly. When I hit the end of this month and ALL servers have
> to run FULL backups, I'm gonna be in a world of trouble I think - I just
> don't understand what has changed, because it's all been running great
> up until now.
>
> // Tom

Hello,

We certainly can use help optimizing SQL since we are not DBAs, and we will 
look into the SQL optimization suggestions given above, keeping in mind that 
there are often rather radical differences in timing of particular SQL 
queries depending on the database engine used.

To the best of my knowledge nothing has changed in terms of the Batch insert 
queries since when it was implemented, and it is *very* unlikely (though I 
haven't checked the code) that something changed from 2.4.4 to 3.0.x.

More likely, your workload or MySQL has changed in some way -- e.g. more Jobs, 
more machines backed up, Director machine with less memory or other jobs that 
use memory, a new version of MySQL, ...

- I would suggest that you ensure that your database has all the recommended 
indexes (see the make_mysql_tables file), and that you are running with the 
large memory /etc/my.cnf file.

- Another thing to do is to compact your database.  One way to do it is to 
write it to an ASCII file and then re-insert it.

- If you are running certain programs that create and delete lots of temporary 
files with different names, you Filename table may need cleaning.

- I would strongly recommend not starting *lots* of Full backups at the same 
time or on the same day.  By lots, I mean more than say 10 or 20 (depends on 
the size of your system).  It is generally far better to stage 1/4 of the 
backup every week for a full backup so that the peak workload is spread out 
over the month.

- If the bottleneck is in MySQL, you might consider moving it to another 
machine that has more memory and faster disks.

- If you really have a huge number of backups (say 50 or more) that all run at 
the same time, it might be advisable to consider using PostgreSQL, but in 
that case, you will probably need an onsite DBA to properly tune and maintain 
it.

- Regardless of what hardware you have, there are certain limitations on how 
many simultaneous jobs you can run (this kicks in on many systems around 50).  
Once a certain number is exceeded, the total throughput can rather radically 
decrease so careful monitoring is necessary.  Bweb can help a lot in these 
situations.

- If you are really "in a world of trouble" and it is a performance issue, 
there is not much we (the Bacula project) can do for you other than the above 
tips.  However, ($$) Bacula Systems has tools that can help more precisely 
identify bottlenecks and help balance loads.

Regards,

Kern

------------------------------------------------------------------------------
Are you an open source citizen? Join us for the Open Source Bridge conference!
Portland, OR, June 17-19. Two days of sessions, one day of unconference: $250.
Need another reason to go? 24-hour hacker lounge. Register today!
http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridge.org
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>