I've had a chance to do some rough testing to see how expanding the
`lstat' field might affect insert times.
I've gathered insert times for original `file' table with base64-ish
encoded lstat, and new table with expanded lstat. Tests for all
databases were done from the same SQL file, which contains INSERT
statements without a column list, one statement per record. The whole
batch is done in one transaction - BEGIN is issued as the first
statement in the file, and COMMIT as the last.
Times in seconds, lower is better. Note that the b64 form is pre-encoded
in the dump, so these results DO NOT CONSIDER THE TIME COST OF
BASE64-ENCODING THE ORIGINAL LSTAT STRUCTURE:
DBMS b64 expanded diff %diff
PostgreSQL 1011 1290 279 27
MySQL5-MyISAM 731 846 115 16
MySQL5-InnoDB 967 1167 200 21
SQlite3 294 371 77 26
(These are not formal benchmarks. They test only one particular workload
and are not representative of general database performance.)
The base64-lstat SQL insert file is 1734Mb and the expanded format one
is 2189Mb. The expanded dump is about 25% larger due to the numeric
representation, which roughly matches the performance hit on INSERT. I
wouldn't be too surprised if most of this was parsing time, given the
need to convert textual number representations into binary.
I'm curious about what causes this slowdown, actually. Is it parsing the
SQL and converting the text respresentations of the numbers? Is it
something to do with the extra fields in the schema (even though the row
width lands up being much the same) ?
I'm going to poke the PostgreSQL folks about this, but I'd be interested
in thoughts from here too.
Also ... holy *crap*, SQLite3 is fast at simple bulk-loads with no
concurrency/contention.
Note that my MySQL and Pg instances are largely un-tuned, and they're
running on a plain old 7200rpm disk. All I did for Pg was to set
checkpoint_segments to 9 and disable full_page_writes. In particular,
the WAL is on the same drive as the main Pg database, which should not
be the case for any Pg setup where you really care about performance. No
tuning was done at all on the other DBs, since I just don't know them
well enough; they're installed with Ubuntu defaults.
Given that the input SQL file alone is 2GB, and my laptop only has 4GB
of RAM, caching effects probably won't be too big a deal. Despite that,
I pre-read the file ("cp file.sql /dev/null") before each test.
These tests aren't *that* rigourous; I haven't re-run them multiple
times, run them on different machines, etc. However, given the extended
run-time of the tests, minor random effects due to other things
happening at the same time should pretty much average out, and we don't
_need_ things to be all that precise anyway.
b64-Pg:
real 16m51.416s
user 2m23.077s
sys 1m12.505s
expanded-Pg:
real 21m30.639s
user 2m54.755s
sys 1m13.185s
b64-MySQL-MyISAM:
real 12m11.791s
user 1m50.959s
sys 1m7.968s
expanded-MySQL-MyISAM:
real 14m8.387s
user 2m9.892s
sys 1m11.008s
b64-MySQL-InnoDB:
real 16m7.520s
user 1m44.667s
sys 1m2.768s
expanded-MySQL-InnoDB:
real 19m27.605s
user 2m16.465s
sys 1m12.693s
b64-sqlite:
real 4m54.635s
user 4m27.153s
sys 0m11.061s
expanded-sqlite:
real 6m11.574s
user 5m38.573s
sys 0m11.101s
--
Craig Ringer
------------------------------------------------------------------------------
This SF.net email is sponsored by:
High Quality Requirements in a Collaborative Environment.
Download a free trial of Rational Requirements Composer Now!
http://p.sf.net/sfu/www-ibm-com
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
|