Bacula-users

[Bacula-users] Improving the speed of spooling attributes

2014-03-23 15:38:44
Subject: [Bacula-users] Improving the speed of spooling attributes
From: Dan Langille <dan AT langille DOT org>
To: bacula-users <bacula-users AT lists.sourceforge DOT net>
Date: Sun, 23 Mar 2014 15:33:28 -0400
In this email, I write about backup times growing over a few months, and trying 
to figure out why it was so slow.

Conclusion: give your database server as much RAM as you can.  Inserting into 
the File table requires updating 5 indexes.  If that index can be held entirely 
in RAM, those updates can occur without constant swapping to disk.  The amount 
of RAM you need to give it varies according to your database size. Too much or 
too little can increase the time required.

Ref: Bacula 5.2.12 on FreeBSD 9.2, backing up to disk first, then copying to 
tape. Disk storage is raidz2 (more later in post).

The problem: slow backups.  Not slow as in time to backup data, but slow in 
putting the attributes into the database.

In this post, when I speak about time, I am referring to the time it takes to 
spool the data attributes.  Taking this sample job output:

###
23-Mar 05:02 crey-sd JobId 167020: Sending spooled attrs to the Director. 
Despooling 115,264,052 bytes ...
23-Mar 05:09 bacula-dir JobId 167020: Bacula bacula-dir 5.2.12 (12Sep12):
###

In that example, spooling time is 7 minutes (roughly speaking).

Given that different size backups result in different amounts of data spooling, 
I took to measuring the spooling process in MB/s.  From a high of 129 MB/s in 
early January, it dropped to 73 by the end of January, and by mid Feb it was 
5MB/s.

I suspected the file system, etc, but I was proven wrong.  It turned out to be 
a database issue.

First, some fact:

* The File table contains about 172 million records. This size ballooned over 
this period because of increased backups.
* Logging was not being monitored on the database server
* Localhost connections were blocked by the firewall, thus preventing the 
auto-vaccum process from being initiated

The first problem to solve was dead tuples in the File table.  Firewall rules 
were altered to allow auto-vaccum to run.

Various database tuning parameters were changed to get an initial vacuum to run 
in decent time:

* RAM on this PostgreSQL 9.2.4 server is 16GB
* work_mem = 1GB
* maintenance_work_mem = 1GB
* checkpoint_segments = 512
* checkpoint_completion_target = 0.7

Once an autovacuum was done, things improved.  It now took about 45 minutes, 
giving us 40MB/s for spooling attributes.  I figured we must be able to do 
better.

I started playing with SQL by creating my own database table to mirror the 
temporary table, ‘batch’.  Then I started running the insert query to see what 
optimizations I could make.  e.g. I ran this query manually:

INSERT INTO File (FileIndex, 1, PathId, FilenameId, LStat, MD5, DeltaSeq) 
    SELECT B.FileIndex, B.JobId, P.PathId, FN.FilenameId, B.LStat, B.MD5, 
B.DeltaSeq 
      FROM my_batch B
      JOIN Path     P  ON (B.Path = P.Path) 
      JOIN Filename FN ON (B.Name = FN.Name);

I always inserted into Jobid = 1 which I knew was not a job still in history.

More details here: 
https://docs.google.com/document/d/1AVAIi6PmJZZE11N3PLLNtbuxuOES4vCNtXiqoxoP2Xk/edit

I found that these settings helped.  They are standard PostgreSQL settings to 
optimize queries.

shared_buffers = 3GB (postgresql.conf setting)
kern.ipc.shmmax=4294967296 (/etc/sysctl.conf)
kern.ipc.shmall=4294967296

This dropped the insert time to about 6 minutes.  About half of this time is 
constructing the query 

NOTE: Using 2.5GB or 3.5GB decreased the throughput.

Filesystem background:

This is where the backups are stored on disk (i.e. bacula-sd on server B):

$ zfs get all system/usr/local/bacula
NAME                     PROPERTY              VALUE                            
                SOURCE
system/usr/local/bacula  type                  filesystem                       
                -
system/usr/local/bacula  creation              Mon Jul 22 10:25 2013            
                -
system/usr/local/bacula  used                  12.9T                            
                -
system/usr/local/bacula  available             4.32T                            
                -
system/usr/local/bacula  referenced            8.96T                            
                -
system/usr/local/bacula  compressratio         1.25x                            
                -
system/usr/local/bacula  mounted               yes                              
                -
system/usr/local/bacula  quota                 none                             
                default
system/usr/local/bacula  reservation           none                             
                default
system/usr/local/bacula  recordsize            128K                             
                default
system/usr/local/bacula  mountpoint            
/usr/jails/crey.example.org/usr/local/bacula  local
system/usr/local/bacula  sharenfs              off                              
                default
system/usr/local/bacula  checksum              fletcher4                        
                inherited from system
system/usr/local/bacula  compression           lz4                              
                local
system/usr/local/bacula  atime                 off                              
                inherited from system
system/usr/local/bacula  devices               on                               
                default
system/usr/local/bacula  exec                  on                               
                default
system/usr/local/bacula  setuid                on                               
                inherited from system/usr/local
system/usr/local/bacula  readonly              off                              
                local
system/usr/local/bacula  jailed                off                              
                default
system/usr/local/bacula  snapdir               hidden                           
                default
system/usr/local/bacula  aclmode               discard                          
                default
system/usr/local/bacula  aclinherit            restricted                       
                default
system/usr/local/bacula  canmount              on                               
                default
system/usr/local/bacula  xattr                 off                              
                temporary
system/usr/local/bacula  copies                1                                
                default
system/usr/local/bacula  version               5                                
                -
system/usr/local/bacula  utf8only              off                              
                -
system/usr/local/bacula  normalization         none                             
                -
system/usr/local/bacula  casesensitivity       sensitive                        
                -
system/usr/local/bacula  vscan                 off                              
                default
system/usr/local/bacula  nbmand                off                              
                default
system/usr/local/bacula  sharesmb              off                              
                default
system/usr/local/bacula  refquota              none                             
                default
system/usr/local/bacula  refreservation        none                             
                default
system/usr/local/bacula  primarycache          all                              
                default
system/usr/local/bacula  secondarycache        all                              
                default
system/usr/local/bacula  usedbysnapshots       3.97T                            
                -
system/usr/local/bacula  usedbydataset         8.96T                            
                -
system/usr/local/bacula  usedbychildren        0                                
                -
system/usr/local/bacula  usedbyrefreservation  0                                
                -
system/usr/local/bacula  logbias               latency                          
                default
system/usr/local/bacula  dedup                 off                              
                default
system/usr/local/bacula  mlslabel                                               
                -
system/usr/local/bacula  sync                  standard                         
                default
system/usr/local/bacula  refcompressratio      1.30x                            
                -
system/usr/local/bacula  written               19.1G                            
                -
system/usr/local/bacula  logicalused           15.9T                            
                -
system/usr/local/bacula  logicalreferenced     11.4T                            
                -

The database is stored here (on server B):

$ zfs get all system/usr/local/pgsql
NAME                    PROPERTY              VALUE                  SOURCE
system/usr/local/pgsql  type                  filesystem             -
system/usr/local/pgsql  creation              Fri May  3  9:38 2013  -
system/usr/local/pgsql  used                  193G                   -
system/usr/local/pgsql  available             9.75T                  -
system/usr/local/pgsql  referenced            193G                   -
system/usr/local/pgsql  compressratio         2.10x                  -
system/usr/local/pgsql  mounted               yes                    -
system/usr/local/pgsql  quota                 none                   default
system/usr/local/pgsql  reservation           none                   default
system/usr/local/pgsql  recordsize            8K                     local
system/usr/local/pgsql  mountpoint            /usr/local/pgsql       inherited 
from system
system/usr/local/pgsql  sharenfs              off                    default
system/usr/local/pgsql  checksum              fletcher4              inherited 
from system
system/usr/local/pgsql  compression           lz4                    local
system/usr/local/pgsql  atime                 off                    inherited 
from system
system/usr/local/pgsql  devices               on                     default
system/usr/local/pgsql  exec                  on                     default
system/usr/local/pgsql  setuid                on                     inherited 
from system/usr/local
system/usr/local/pgsql  readonly              off                    default
system/usr/local/pgsql  jailed                off                    default
system/usr/local/pgsql  snapdir               hidden                 default
system/usr/local/pgsql  aclmode               discard                default
system/usr/local/pgsql  aclinherit            restricted             default
system/usr/local/pgsql  canmount              on                     local
system/usr/local/pgsql  xattr                 off                    temporary
system/usr/local/pgsql  copies                1                      default
system/usr/local/pgsql  version               5                      -
system/usr/local/pgsql  utf8only              off                    -
system/usr/local/pgsql  normalization         none                   -
system/usr/local/pgsql  casesensitivity       sensitive              -
system/usr/local/pgsql  vscan                 off                    default
system/usr/local/pgsql  nbmand                off                    default
system/usr/local/pgsql  sharesmb              off                    default
system/usr/local/pgsql  refquota              none                   default
system/usr/local/pgsql  refreservation        none                   default
system/usr/local/pgsql  primarycache          metadata               local
system/usr/local/pgsql  secondarycache        all                    default
system/usr/local/pgsql  usedbysnapshots       0                      -
system/usr/local/pgsql  usedbydataset         193G                   -
system/usr/local/pgsql  usedbychildren        0                      -
system/usr/local/pgsql  usedbyrefreservation  0                      -
system/usr/local/pgsql  logbias               latency                default
system/usr/local/pgsql  dedup                 off                    default
system/usr/local/pgsql  mlslabel                                     -
system/usr/local/pgsql  sync                  standard               default
system/usr/local/pgsql  refcompressratio      2.10x                  -
system/usr/local/pgsql  written               193G                   -
system/usr/local/pgsql  logicalused           137G                   -
system/usr/local/pgsql  logicalreferenced     137G                   -



-- 
Dan Langille - http://langille.org

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
<Prev in Thread] Current Thread [Next in Thread>
  • [Bacula-users] Improving the speed of spooling attributes, Dan Langille <=