Bacula-users

Re: [Bacula-users] Tune MySQL, or switch to pgsql?

2013-02-02 06:17:07
Subject: Re: [Bacula-users] Tune MySQL, or switch to pgsql?
From: Adrian Reyer <bacula-lists AT lihas DOT de>
To: Alan McKay <alan.mckay+bacula AT gmail DOT com>
Date: Sat, 2 Feb 2013 12:13:25 +0100
Hi Alan,

On Fri, Feb 01, 2013 at 02:48:29PM -0500, Alan McKay wrote:
> I've been googling to find MySQL tuning instructions for bacula but
> not much is coming up.   Though I did find a 2 or 3 year old
> discussion from this list which suggested I'd get better performance
> from pgsql anyway.    I certainly don't know pgsql as well as MySQL,
> but I did used to know a fair bit about it and switching would not be
> that big a deal to me.

I am used to do a certain tuning on databases, MySQL mostly and to a way
smaller extend PostgreSQL. Because I have way more to do with MySQL and
like the easy database replication there I started with MySQL in my
biggest Bacula installation. Back then with a 4GB RAM machine running sd
sd, dir and db. Soon my backups completed very very slow. The database
had been adjusted to the host and the task. So I migrated the db to some
other box with 16GB ram, at first MySQL got 8GB assigned, then 1 month
later 12GB. With both values the backups became slower and slower again,
despite attribute spooling.
As I personally have a tendency to like postgres more than mysql as I
have the feeling current postgres always does what I need right now,
while only the next version mysql will do so as well, I decided to
switch to postgres.
postgres is set up to use 8GB of 16GB and it is now running for 15
months without issues, I have a retention time of 13 months and among
the backups had been an imap server with 1 file/mail, ~350GB mails.

I think the reason postgres performcs so much better in this setup is
the way it handles indices. If you have colums a,b,c and do selects
based on a, a,b and a,b,c with mysql there are 3 indices. With postgres
you only need 1 on a,b,c as it can ignore the unneded parts. So when
storing a single file info in the database, you basically need 4 writes
with mysql (data, a, a,b, a,b,c) and only 2 with pgsql (data, a,b,c).
With both databases I turned sync writes of, btw.
My tests had been with mysql 5.0 and postgres 9.0, the database server
is an old leftover machine with 16GB RAM and an unfortunate mix of
storage located on an old FC-SAN as well as a local software raid-5,
most of the san-storage in use by a file storage. In other words, it is
not exactly IO-optimized hardware I have available.

Current stats with postgres:
bacula=# select count(*) from file ;
   count
-----------
 221128055
cula=# select count(*) from path ;
 count  
--------
 818016
bacula=# select count(*) from job ;
 count 
-------
 22623
# du -s /var/lib/postgresql/9.0/main/base/
64749160        /var/lib/postgresql/9.0/main/base/

Retention time is 13 months on tape, Full 1/month, Diff 1/week,
Inc daily except an oracle DB server, there I use daily Diff on the
backup areas. Daily backup is to File storage, then immidiatly copied
two 2 pools of tapes, one to be kept offsite. File storage retention is
40 days for easy access to the most recent backups.

> Anyone have any specific links for MySQL + Bacula tuning?

No specific links, I use tuning-primer.sh and mysqltuner. With the
output keep in mind you are optimizing for writes during your backup
runs.

> Anyone have any datapoints on MySQL vs pgsql for Bacula?

In the initial setup, the 4GB machine had been to small to back up with
mysql after 2 full runs, it ran each another full with 8GB and 16GB
allocated for mysql. On the very same machine it runs fine with postgres
and 8GB ram + iSCSI-exported disk for the single SD I use on the same
server with the DIR for file/tape storage.
'To small' translates into 'all backup data is on storage, but 2 days
later the attribute spooling is still running'. Yes, I use data and
attribute spooling.

> Will pgsql still have to be tuned?

Every database has to be tuned.
A database is a general purpose application, suitable for many different
tasks on many different types of hardware. You always have to adjust it
to your tasks. It might suffice to do very basic adjustments, e.g.
open_tables matching your project and the buffer sizes matching your RAM
in mysql. With postgres you can start right there with the memory
adjustments, but if you really want to tune it, you can even go as far
as telling postgres how much disk io costs compared to memory/cpu
cycles. I have not yet done that in my postgres setup as I didn't feel the
need.

For my setup, here are the changed settings, postgres is 9.0 from
'backports' on debian squeeze, as said, software raid involved, 16GB
host memory, 2*Dual Core Xeon 5160:

 data_directory = '/var/lib/postgresql/9.0/main'         # use data in another 
directory
 hba_file = '/etc/postgresql/9.0/main/pg_hba.conf'       # host-based 
authentication file
 ident_file = '/etc/postgresql/9.0/main/pg_ident.conf'   # ident configuration 
file
 external_pid_file = '/var/run/postgresql/9.0-main.pid'          # write an 
extra PID file
 listen_addresses = '*'          # what IP address(es) to listen on;
 port = 5432                             # (change requires restart)
 max_connections = 60                    # (change requires restart)
 superuser_reserved_connections = 3      # (change requires restart)
 unix_socket_directory = '/var/run/postgresql'           # (change requires 
restart)
shared_buffers = 8192MB                 # min 128kB
temp_buffers = 512MB                    # min 800kB
work_mem = 32MB                         # min 64kB
max_stack_depth = 7680kB                # min 100kB
effective_io_concurrency = 10           # 1-1000. 0 disables prefetching
wal_level = minimal                     # minimal, archive, or hot_standby
fsync = off                             # turns forced synchronization on or off
synchronous_commit = off                # immediate fsync at commit
wal_writer_delay = 5000ms               # 1-10000 milliseconds
commit_delay = 10000                    # range 0-100000, in microseconds
commit_siblings = 5                     # range 1-1000
wal_sender_delay = 5000ms       # walsender cycle time, 1-10000 milliseconds
 log_line_prefix = '%t '                 # special values:
 datestyle = 'iso, dmy'
 lc_messages = 'de_DE.UTF-8'                     # locale for system error 
message
 lc_monetary = 'de_DE.UTF-8'                     # locale for monetary 
formatting
 lc_numeric = 'de_DE.UTF-8'                      # locale for number formatting
 lc_time = 'de_DE.UTF-8'                         # locale for time formatting
 default_text_search_config = 'pg_catalog.german'

max_connections limits the amount of concurrent jobs you can run, if I
recall correctly. I idented the not-that-performance-relevant settings
by 1, so you won't get lost as easily by irrelevant things.
As well I adjusted a few sysctl settings, relevant for postgres are
shm*, the net* are more relevant for iSCSI and I like to avoid swapping.
As well I changed the disk scheduler to 'deadline', if you have a
hardware raid controller with memory cache, noop might be even better.
In no case the default cfq is useful unless it is a desktop computer. I
just assume you are on linux here, just because I am myself. ;)
/etc/sysctl.conf:
kernel.shmmax = 17179869184
kernel.shmall = 4194304
vm.swappiness = 0
net.ipv4.tcp_low_latency = 1
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_syncookies = 1
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_no_metrics_save = 1
net.ipv4.tcp_moderate_rcvbuf = 1
net.core.netdev_max_backlog = 2500

Keep in mind, these settings are for the very box I am running, you need
to adjust them, especially if you have a different amount of RAM.
I am personally intrested in other people ways to adjust the database
servers for bacula.

I use bacula 5.0.2 and 5.0.3 here, the packages debian provides.

Regards,
        Adrian
-- 
LiHAS - Adrian Reyer - Hessenwiesenstraße 10 - D-70565 Stuttgart
Fon: +49 (7 11) 78 28 50 90 - Fax:  +49 (7 11) 78 28 50 91
Mail: lihas AT lihas DOT de - Web: http://lihas.de
Linux, Netzwerke, Consulting & Support - USt-ID: DE 227 816 626 Stuttgart

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_jan
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users