Bacula-users

[Bacula-users] Slow Backups with MySQL

2012-05-29 18:04:13
Subject: [Bacula-users] Slow Backups with MySQL
From: Greg Neujahr <keetfox AT gmail DOT com>
To: bacula-users AT lists.sourceforge DOT net
Date: Tue, 29 May 2012 18:01:35 -0400
Hey Folks,

I run a few slackware boxes and recently (read: about a month ago)
upgraded the mysql server and client libraries on my installs.
Sometime after that point, the full backups I was performing on these
three boxes started to take 5-6 hours a piece, as opposed to the 30-40
minutes it was previously taking. At first I thought it might be a
file IO issue, but I've since traced it back to some interaction with
MySQL. I've pasted the snippets of info I believe are pertinent to the
problem, but am willing to provide more information as needed.

OS: slackware linux - 3.2.13-smp
CPU: Core i3 @ 3.07
Memory: 4GB
MySQL: /usr/libexec/mysqld  Ver 5.5.24 for Linux on i686 (MySQL
Community Server (GPL))
Bacula: watchfox-dir Version: 5.2.6 (21 February 2012)
i686-pc-linux-gnu slackware Slackware 13.37.0

All three machines are running the same install of the software, but
rebuilt on each machine. MySQL was installed via the package manager
from binaries. Bacula was configured on the director with the
following options:

--prefix=/usr --sysconfdir=/etc/bacula --with-openssl --with-mysql
--enable-batch-insert

The other two machines (clients) are built with the following options:

--prefix=/usr --sysconfdir=/etc/bacula --with-openssl --with-mysql
--enable-client-only

The compressed backup of the three machines is around 4GB, give or
take, covering about 400k files. All in all, a relatively small
backup. I've enabled compression as GZIP and signature as MD5. Both of
these options were enabled prior to the issues happening. The director
connects to a MySQL database on one of the other machines for its
need. The connection is a 100Mb connection, which has always been
sufficient. The file daemon writes to a /mnt/backup location, which is
an NFS mount to an array of large drives on the network.

Originally, I did NOT have 'Spool Attributes' set to 'yes', but
enabled it to separate out the SQL from the file IO. The file backup
completes in about 21 minutes for one of the machines. But the
'despooling' of about 125MB worth of data takes 3 and a half hours. I
checked the 'mysqladmin' command to get an idea of the query rate
during this despooling time and find that it fluctuates around 20 qps.
If I take the resulting bacula database, dump it using mysqldump, drop
the tables, and insert the same set of data back into the database
from the same remote machine using the mysql 'source' command, it
takes less than 10 seconds. With 'Spool Attributes' set to 'No', the
entire backup just appears to take 5 hours as the SQL is done in-line
with the backup.

I also did not originally have '--enable-batch-insert' in the
configure options, as it supposedly defaulted to 'yes'. I recompiled
to enable it, just for testing.

I also did an strace of the bacula file daemon and the mysql daemon to
see if there was any system IO that may have been holding it up, but
after 10 minutes, bacula spent a total of 1 second in system calls,
and mysql 40 seconds. Alas, I did not get an strace of the director
daemon. Neither machine is showing any amount of heavy load during
this time, so I'm not sure where to begin looking.

Below is the output from a bconsole session where I ran the backup of
a mail server manually. As you can see, the actual file writing takes
place fairly quickly, but the despooling takes forever. I would
suspect mysql as the culprit here, but, as I stated previously, I can
restore a mysldump of the database over the same connection much more
quickly than bacula does.

I am willing to provide further information to you folks as needed, or
recompile bacula and test. Any suggestions on where to look would be
welcome. Or, if a bug report would be a more appropriate avenue, I can
file that as well.

Thanks,

- Greg

----- bconsole output -----
Run Backup job
JobName:  Vixen Backup Job
Level:    Incremental
Client:   vixen-fd
FileSet:  Vixen Backup Set
Pool:     Vixen Incremental Pool (From Job resource)
Storage:  File (From Job resource)
When:     2012-05-29 12:06:43
Priority: 10
OK to run? (yes/mod/no): yes
Job queued. JobId=1
You have messages.
*messages
29-May 12:06 watchfox-dir JobId 1: No prior Full backup Job record found.
29-May 12:06 watchfox-dir JobId 1: No prior or suitable Full backup
found in catalog. Doing FULL backup.
29-May 12:06 watchfox-dir JobId 1: Start Backup JobId 1,
Job=Vixen_Backup_Job.2012-05-29_12.06.46_03
29-May 12:06 watchfox-dir JobId 1: Created new Volume "Vixen_Full_0" in catalog.
29-May 12:06 watchfox-dir JobId 1: Using Device "FileStorage"
29-May 12:06 watchfox-sd JobId 1: Labeled new Volume "Vixen_Full_0" on
device "FileStorage" (/mnt/backup).
29-May 12:06 watchfox-sd JobId 1: Wrote label to prelabeled Volume
"Vixen_Full_0" on device "FileStorage" (/mnt/backup)
29-May 12:06 watchfox-dir JobId 1: Max Volume jobs=1 exceeded. Marking
Volume "Vixen_Full_0" as Used.
29-May 12:28 watchfox-sd JobId 1: Job write elapsed time = 00:21:32,
Transfer rate = 3.127 M Bytes/second
29-May 12:28 watchfox-sd JobId 1: Sending spooled attrs to the
Director. Despooling 125,083,020 bytes ...
29-May 17:05 watchfox-dir JobId 1: Bacula watchfox-dir 5.2.6 (21Feb12):
  Build OS:               i686-pc-linux-gnu slackware Slackware 13.37.0
  JobId:                  1
  Job:                    Vixen_Backup_Job.2012-05-29_12.06.46_03
  Backup Level:           Full (upgraded from Incremental)
  Client:                 "vixen-fd" 5.2.6 (21Feb12)
i686-pc-linux-gnu,slackware,Slackware 13.37.0
  FileSet:                "Vixen Backup Set" 2012-05-29 12:06:46
  Pool:                   "Vixen Full Pool" (From Job FullPool override)
  Catalog:                "Catalog" (From Client resource)
  Storage:                "File" (From Job resource)
  Scheduled time:         29-May-2012 12:06:43
  Start time:             29-May-2012 12:06:49
  End time:               29-May-2012 17:05:17
  Elapsed time:           4 hours 58 mins 28 secs
  Priority:               10
  FD Files Written:       404,494
  SD Files Written:       404,494
  FD Bytes Written:       3,984,337,612 (3.984 GB)
  SD Bytes Written:       4,040,084,452 (4.040 GB)
  Rate:                   222.5 KB/s
  Software Compression:   54.6 %
  VSS:                    no
  Encryption:             no
  Accurate:               yes
  Volume name(s):         Vixen_Full_0
  Volume Session Id:      1
  Volume Session Time:    1338307461
  Last Volume Bytes:      4,056,969,627 (4.056 GB)
  Non-fatal FD errors:    0
  SD Errors:              0
  FD termination status:  OK
  SD termination status:  OK
  Termination:            Backup OK

29-May 17:05 watchfox-dir JobId 1: Begin pruning Jobs older than 6 months .
29-May 17:05 watchfox-dir JobId 1: No Jobs found to prune.
29-May 17:05 watchfox-dir JobId 1: Begin pruning Files.
29-May 17:05 watchfox-dir JobId 1: No Files found to prune.
29-May 17:05 watchfox-dir JobId 1: End auto prune.

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>