In this email, I write about backup times growing over a few months, and trying
to figure out why it was so slow.
Conclusion: give your database server as much RAM as you can. Inserting into
the File table requires updating 5 indexes. If that index can be held entirely
in RAM, those updates can occur without constant swapping to disk. The amount
of RAM you need to give it varies according to your database size. Too much or
too little can increase the time required.
Ref: Bacula 5.2.12 on FreeBSD 9.2, backing up to disk first, then copying to
tape. Disk storage is raidz2 (more later in post).
The problem: slow backups. Not slow as in time to backup data, but slow in
putting the attributes into the database.
In this post, when I speak about time, I am referring to the time it takes to
spool the data attributes. Taking this sample job output:
###
23-Mar 05:02 crey-sd JobId 167020: Sending spooled attrs to the Director.
Despooling 115,264,052 bytes ...
23-Mar 05:09 bacula-dir JobId 167020: Bacula bacula-dir 5.2.12 (12Sep12):
###
In that example, spooling time is 7 minutes (roughly speaking).
Given that different size backups result in different amounts of data spooling,
I took to measuring the spooling process in MB/s. From a high of 129 MB/s in
early January, it dropped to 73 by the end of January, and by mid Feb it was
5MB/s.
I suspected the file system, etc, but I was proven wrong. It turned out to be
a database issue.
First, some fact:
* The File table contains about 172 million records. This size ballooned over
this period because of increased backups.
* Logging was not being monitored on the database server
* Localhost connections were blocked by the firewall, thus preventing the
auto-vaccum process from being initiated
The first problem to solve was dead tuples in the File table. Firewall rules
were altered to allow auto-vaccum to run.
Various database tuning parameters were changed to get an initial vacuum to run
in decent time:
* RAM on this PostgreSQL 9.2.4 server is 16GB
* work_mem = 1GB
* maintenance_work_mem = 1GB
* checkpoint_segments = 512
* checkpoint_completion_target = 0.7
Once an autovacuum was done, things improved. It now took about 45 minutes,
giving us 40MB/s for spooling attributes. I figured we must be able to do
better.
I started playing with SQL by creating my own database table to mirror the
temporary table, ‘batch’. Then I started running the insert query to see what
optimizations I could make. e.g. I ran this query manually:
INSERT INTO File (FileIndex, 1, PathId, FilenameId, LStat, MD5, DeltaSeq)
SELECT B.FileIndex, B.JobId, P.PathId, FN.FilenameId, B.LStat, B.MD5,
B.DeltaSeq
FROM my_batch B
JOIN Path P ON (B.Path = P.Path)
JOIN Filename FN ON (B.Name = FN.Name);
I always inserted into Jobid = 1 which I knew was not a job still in history.
More details here:
https://docs.google.com/document/d/1AVAIi6PmJZZE11N3PLLNtbuxuOES4vCNtXiqoxoP2Xk/edit
I found that these settings helped. They are standard PostgreSQL settings to
optimize queries.
shared_buffers = 3GB (postgresql.conf setting)
kern.ipc.shmmax=4294967296 (/etc/sysctl.conf)
kern.ipc.shmall=4294967296
This dropped the insert time to about 6 minutes. About half of this time is
constructing the query
NOTE: Using 2.5GB or 3.5GB decreased the throughput.
Filesystem background:
This is where the backups are stored on disk (i.e. bacula-sd on server B):
$ zfs get all system/usr/local/bacula
NAME PROPERTY VALUE
SOURCE
system/usr/local/bacula type filesystem
-
system/usr/local/bacula creation Mon Jul 22 10:25 2013
-
system/usr/local/bacula used 12.9T
-
system/usr/local/bacula available 4.32T
-
system/usr/local/bacula referenced 8.96T
-
system/usr/local/bacula compressratio 1.25x
-
system/usr/local/bacula mounted yes
-
system/usr/local/bacula quota none
default
system/usr/local/bacula reservation none
default
system/usr/local/bacula recordsize 128K
default
system/usr/local/bacula mountpoint
/usr/jails/crey.example.org/usr/local/bacula local
system/usr/local/bacula sharenfs off
default
system/usr/local/bacula checksum fletcher4
inherited from system
system/usr/local/bacula compression lz4
local
system/usr/local/bacula atime off
inherited from system
system/usr/local/bacula devices on
default
system/usr/local/bacula exec on
default
system/usr/local/bacula setuid on
inherited from system/usr/local
system/usr/local/bacula readonly off
local
system/usr/local/bacula jailed off
default
system/usr/local/bacula snapdir hidden
default
system/usr/local/bacula aclmode discard
default
system/usr/local/bacula aclinherit restricted
default
system/usr/local/bacula canmount on
default
system/usr/local/bacula xattr off
temporary
system/usr/local/bacula copies 1
default
system/usr/local/bacula version 5
-
system/usr/local/bacula utf8only off
-
system/usr/local/bacula normalization none
-
system/usr/local/bacula casesensitivity sensitive
-
system/usr/local/bacula vscan off
default
system/usr/local/bacula nbmand off
default
system/usr/local/bacula sharesmb off
default
system/usr/local/bacula refquota none
default
system/usr/local/bacula refreservation none
default
system/usr/local/bacula primarycache all
default
system/usr/local/bacula secondarycache all
default
system/usr/local/bacula usedbysnapshots 3.97T
-
system/usr/local/bacula usedbydataset 8.96T
-
system/usr/local/bacula usedbychildren 0
-
system/usr/local/bacula usedbyrefreservation 0
-
system/usr/local/bacula logbias latency
default
system/usr/local/bacula dedup off
default
system/usr/local/bacula mlslabel
-
system/usr/local/bacula sync standard
default
system/usr/local/bacula refcompressratio 1.30x
-
system/usr/local/bacula written 19.1G
-
system/usr/local/bacula logicalused 15.9T
-
system/usr/local/bacula logicalreferenced 11.4T
-
The database is stored here (on server B):
$ zfs get all system/usr/local/pgsql
NAME PROPERTY VALUE SOURCE
system/usr/local/pgsql type filesystem -
system/usr/local/pgsql creation Fri May 3 9:38 2013 -
system/usr/local/pgsql used 193G -
system/usr/local/pgsql available 9.75T -
system/usr/local/pgsql referenced 193G -
system/usr/local/pgsql compressratio 2.10x -
system/usr/local/pgsql mounted yes -
system/usr/local/pgsql quota none default
system/usr/local/pgsql reservation none default
system/usr/local/pgsql recordsize 8K local
system/usr/local/pgsql mountpoint /usr/local/pgsql inherited
from system
system/usr/local/pgsql sharenfs off default
system/usr/local/pgsql checksum fletcher4 inherited
from system
system/usr/local/pgsql compression lz4 local
system/usr/local/pgsql atime off inherited
from system
system/usr/local/pgsql devices on default
system/usr/local/pgsql exec on default
system/usr/local/pgsql setuid on inherited
from system/usr/local
system/usr/local/pgsql readonly off default
system/usr/local/pgsql jailed off default
system/usr/local/pgsql snapdir hidden default
system/usr/local/pgsql aclmode discard default
system/usr/local/pgsql aclinherit restricted default
system/usr/local/pgsql canmount on local
system/usr/local/pgsql xattr off temporary
system/usr/local/pgsql copies 1 default
system/usr/local/pgsql version 5 -
system/usr/local/pgsql utf8only off -
system/usr/local/pgsql normalization none -
system/usr/local/pgsql casesensitivity sensitive -
system/usr/local/pgsql vscan off default
system/usr/local/pgsql nbmand off default
system/usr/local/pgsql sharesmb off default
system/usr/local/pgsql refquota none default
system/usr/local/pgsql refreservation none default
system/usr/local/pgsql primarycache metadata local
system/usr/local/pgsql secondarycache all default
system/usr/local/pgsql usedbysnapshots 0 -
system/usr/local/pgsql usedbydataset 193G -
system/usr/local/pgsql usedbychildren 0 -
system/usr/local/pgsql usedbyrefreservation 0 -
system/usr/local/pgsql logbias latency default
system/usr/local/pgsql dedup off default
system/usr/local/pgsql mlslabel -
system/usr/local/pgsql sync standard default
system/usr/local/pgsql refcompressratio 2.10x -
system/usr/local/pgsql written 193G -
system/usr/local/pgsql logicalused 137G -
system/usr/local/pgsql logicalreferenced 137G -
--
Dan Langille - http://langille.org
signature.asc
Description: Message signed with OpenPGP using GPGMail
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech _______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
|