Bacula-users

Re: [Bacula-users] getting rid of the base64 encoded lstat field - insert time testing

2009-04-13 05:09:14
Subject: Re: [Bacula-users] getting rid of the base64 encoded lstat field - insert time testing
From: Craig Ringer <craig AT postnewspapers.com DOT au>
To: bacula-users AT lists.sourceforge DOT net
Date: Mon, 13 Apr 2009 17:05:07 +0800
I've had a chance to do some rough testing to see how expanding the
`lstat' field might affect insert times.

I've gathered insert times for original `file' table with base64-ish
encoded lstat, and new table with expanded lstat. Tests for all
databases were done from the same SQL file, which contains INSERT
statements without a column list, one statement per record. The whole
batch is done in one transaction - BEGIN is issued as the first
statement in the file, and COMMIT as the last.

Times in seconds, lower is better. Note that the b64 form is pre-encoded
in the dump, so these results DO NOT CONSIDER THE TIME COST OF
BASE64-ENCODING THE ORIGINAL LSTAT STRUCTURE:

DBMS           b64      expanded     diff     %diff
PostgreSQL     1011     1290         279      27
MySQL5-MyISAM   731      846         115      16
MySQL5-InnoDB   967     1167         200      21
SQlite3         294      371          77      26

(These are not formal benchmarks. They test only one particular workload
and are not representative of general database performance.)

The base64-lstat SQL insert file is 1734Mb and the expanded format one
is 2189Mb. The expanded dump is about 25% larger due to the numeric
representation, which roughly matches the performance hit on INSERT. I
wouldn't be too surprised if most of this was parsing time, given the
need to convert textual number representations into binary.


I'm curious about what causes this slowdown, actually. Is it parsing the
SQL and converting the text respresentations of the numbers? Is it
something to do with the extra fields in the schema (even though the row
width lands up being much the same) ?

I'm going to poke the PostgreSQL folks about this, but I'd be interested
in thoughts from here too.

Also ... holy *crap*, SQLite3 is fast at simple bulk-loads with no
concurrency/contention.


Note that my MySQL and Pg instances are largely un-tuned, and they're
running on a plain old 7200rpm disk. All I did for Pg was to set
checkpoint_segments to 9 and disable full_page_writes. In particular,
the WAL is on the same drive as the main Pg database, which should not
be the case for any Pg setup where you really care about performance. No
tuning was done at all on the other DBs, since I just don't know them
well enough; they're installed with Ubuntu defaults.

Given that the input SQL file alone is 2GB, and my laptop only has 4GB
of RAM, caching effects probably won't be too big a deal. Despite that,
I pre-read the file ("cp file.sql /dev/null") before each test.

These tests aren't *that* rigourous; I haven't re-run them multiple
times, run them on different machines, etc. However, given the extended
run-time of the tests, minor random effects due to other things
happening at the same time should pretty much average out, and we don't
_need_ things to be all that precise anyway.

b64-Pg:

real    16m51.416s
user    2m23.077s
sys     1m12.505s

expanded-Pg:

real    21m30.639s
user    2m54.755s
sys     1m13.185s

b64-MySQL-MyISAM:

real    12m11.791s
user    1m50.959s
sys     1m7.968s

expanded-MySQL-MyISAM:

real    14m8.387s
user    2m9.892s
sys     1m11.008s

b64-MySQL-InnoDB:

real    16m7.520s
user    1m44.667s
sys     1m2.768s

expanded-MySQL-InnoDB:

real    19m27.605s
user    2m16.465s
sys     1m12.693s

b64-sqlite:

real    4m54.635s
user    4m27.153s
sys     0m11.061s

expanded-sqlite:

real    6m11.574s
user    5m38.573s
sys     0m11.101s

--
Craig Ringer

------------------------------------------------------------------------------
This SF.net email is sponsored by:
High Quality Requirements in a Collaborative Environment.
Download a free trial of Rational Requirements Composer Now!
http://p.sf.net/sfu/www-ibm-com
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users