Bacula-users

[Bacula-users] Severe Performance Issues with high volume of files

2009-04-01 09:28:29
Subject: [Bacula-users] Severe Performance Issues with high volume of files
From: Daniel Holtkamp <holtkamp AT riege DOT com>
To: Bacula-users AT lists.sourceforge DOT net
Date: Wed, 01 Apr 2009 14:57:50 +0200
Hi !

TL;DR
When backing up / migrating clients with a high volume of files the
performance drops to a very low level. This can be traced back to the
Database inserts.

Our server:
Dell PE2900
4-Core Xeon 2GHz, 4GB Ram
Red Hat Enterprise Linux 5.3 64-Bit
2.6.18-128.1.1.el5 #1 SMP Mon Jan 26 13:58:24 EST 2009 x86_64 x86_64
x86_64 GNU/Linux
PERC 5/i   - 200GB SAS Raid-5 (System + Database)
PERC 4e/DC - 2x 350GB Raid-5 Disk Enclosure (MySQL-Temp & customer Data)
LSI1020    - 4x LTO-1 Streamer
QLA200     - 5 TB Fibre-Channel SAN with SATA Drives (Bacula Storage)

Bacula 2.4.4 - batch insert enabled
MySQL 5.0.45

We have been using Bacula for the past 5 years i think. My company
actually financed the implementation of migration.

We have recently upgraded our Server to RHEL5.3 64-Bit in hope that this
might solve our problems, but i`m afraid that is not the case.

What the server CAN do - backing up 20 concurrent clients:
Average to-disk backup speed of 30MB/s
1500 - 2000 Database Inserts per second

Of course this may vary depending on the client data that is beeing
backed up / client performance.

For example this is from our Fileserver fullbackup:
  Elapsed time:           1 day 30 mins 43 secs
  FD Files Written:       1,664,826
  SD Files Written:       1,664,826
  FD Bytes Written:       200,165,438,355 (200.1 GB)
  SD Bytes Written:       200,453,874,910 (200.4 GB)
  Rate:                   2268.3 KB/s
  Software Compression:   35.8 %

Its not exactly quick but acceptable for a fullbackup with compression,
checksum, acls etc. This data is on our central SAN.

Now lets take a look at another Server. This data is also on the same
SAN as our Fileserver.
  Elapsed time:           14 hours 7 mins 45 secs
  FD Files Written:       6,383,475
  SD Files Written:       6,383,475
  FD Bytes Written:       7,504,478,849 (7.504 GB)
  SD Bytes Written:       8,613,076,970 (8.613 GB)
  Rate:                   147.5 KB/s
  Software Compression:   46.5 %

You can see there is quite a big difference between those two. Even
though the Size is a LOT smaller (not even 5%) it has 4x the amount of
files. You can see the pityful rate at which this backup runs. This
isn't nice, but acceptable for a fullbackup that runs on the weekend.
Bacula will get slow when lots of files are in one directory, but i
think that is mainly a result of filesystem performance (ever tried `ls`
in a directory with 1+ million files ?).

Now lets see how those two fare during migration to tape.

First the Fileserver:
  Elapsed time:           5 hours 55 mins 28 secs
  SD Files Written:       1,635,930
  SD Bytes Written:       197,822,546,953 (197.8 GB)
  Rate:                   9275.3 KB/s

6 hours, average Rate of 9 MB/s is very nice.

and the other server:
  Elapsed time:           8 hours 4 mins
  Priority:               10
  SD Files Written:       6,258,643
  SD Bytes Written:       8,497,756,378 (8.497 GB)
  Rate:                   292.6 KB/s

It took 2 hours MORE for 8.5 GB instead of 200GB ... with a VERY bad
rate for our Tapedrive ... quite obviously the problem is the high
volume of files.

Attribute spooling helps with the tape-wear, but makes the whole
situation worse. This is from a previous backup with attribute spooling
and before we cleaned up the number of files a bit.

  Elapsed time:           20 hours 41 mins 24 secs
  SD Files Written:       11,636,812
  SD Bytes Written:       9,112,130,555 (9.112 GB)
  Rate:                   122.3 KB/s

If you take a closer look at that migration you can see that the
tape-write process was done quite quickly (10 minutes in fact):

12-Mar 12:48 backup-sd JobId 30505: Ready to read from volume
12-Mar 12:58 backup-sd JobId 30505: End of all volumes.

But after that follows this:
12-Mar 12:58 backup-sd JobId 30505: Sending spooled attrs to the
Director. Despooling 4,111,322,637 bytes ...
13-Mar 09:36 backup-dir JobId 30505: Bacula backup-dir 2.4.4 (28Dec08):

It takes 20 hours and 30 minutes to spool the attributes to the
database. The time is spent about 2/3 with writing to the batch table
and 1/3 with commiting the batch table to the database. This itself
wouldn't be such a big problem if the job would not block the
tape-devices and other migration jobs from running. You have to know
that we have 12 clients with a file-volume of 5 - 12 million files.
Lately we have run into the problem that one week is not enough time to
migrate all data to tape so that in the end we never finish migrating
(and can never restart the SD because it is in use 100% of the time).

Using mytop i observed the Database during the migration. It doesn't
matter if spooling is on or off, it writes at a steady 240 queries per
second to the database. And with that many files it takes a while.

Is there anything that can be done about this ? Is this a mysql
limitation or a bacula limitation ? What are your experiences with a
high volume of files ?

Things one might think about:
Why does a migration job insert the complete file attributes again ? The
data is already in the database but linked to the original backup job.
After the migration has run the on disk-data cannot be accessed anymore,
so why keep it ? Or even better, why not just UPDATE it so it links to
the migration job ? This should go a LOT quicker than doing a complete
insert again and would remove this bottleneck completely.

I'm afraid i'm not a programmer or i would take a look into this myself.
I actually did look at the sources but it could be encrypted and look
all the same to me :)

If you need any more information on this i will try to get it. Just let
me know.

Best regards,

Daniel Holtkamp

-- 
.............................................................
Riege Software International GmbH  Fon: +49 (2159) 9148 0
Mollsfeld 10                       Fax: +49 (2159) 9148 11
40670 Meerbusch                    Web: www.riege.com
Germany                            E-Mail: holtkamp AT riege DOT com
---                                ---
Handelsregister:                   Managing Directors:
Amtsgericht Neuss HRB-NR 4207      Christian Riege
USt-ID-Nr.: DE120585842            Gabriele  Riege
                                   Johannes  Riege
.............................................................
           YOU CARE FOR FREIGHT, WE CARE FOR YOU          




------------------------------------------------------------------------------
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>