ADSM-L

Squishing the unused space out of my ADSM database

1999-05-24 07:51:08
Subject: Squishing the unused space out of my ADSM database
From: Richard Sims <rbs AT BU DOT EDU>
Date: Mon, 24 May 1999 07:51:08 -0400
>I would be a bit cautious on the expected outcome of a smaller database
>when using the dump/load DB.  There is a very strong possibility that
>the database will end up bigger than you started with!  I can't explain
>this in full detail but it is to do with the way records are packed into
>DB pages when the load occurs, and space is left in pages for later
>expansion.

Here's info I saved from the adroit explanation from David Bohm...
   Richard Sims, BU
---------------
ADSM DATABASE STRUCTURE AND DUMPDB/LOADDB (David Bohm, ADSM server development):
ADSM DATABASE STRUCTURE AND DUMPDB/LOADDB (David Bohm, ADSM server development):

The ADSM server data base contains different objects.  Most of the objects are
b-tree tables.  The cause of using more space for the LOADDB than was actually
used in the data base that was dumped with the DUMPDB command is a result of
the algorithm used to perform the DUMPDB/LOADDB and the characteristics of a
b-tree object...

When a record is to be inserted into a node in a b-tree and that record does
not fit then a split occurs.  In a standard b-tree algorithm 1/2 of the data
goes in one leaf node and the other 1/2 goes into another leaf node.  When
this happens randomly over time you get a tree where about 50% of the data
base is unused space.  With the V2 ADSM server we added a little more
intelligence in the split process.  There are many tables in the ADSM server
where a new record will always be the highest key value in that table.  If the
insert is the highest key value then instead of doing a 1/2 and 1/2 split we
just add a new leaf node with that single record.  This results in closer to
100% utilization in each of the leaf nodes in the ADSM server.

This now takes us to the DUMPDB/LOADDB process.  One of the purposes of this
process is to recover from a corrupted data base index.  What this means is we
ignore the index on the DUMPDB process and only dump the b-tree leaf nodes
(plus another type of data base object called a bitvector).  These leaf nodes
are not stored physically in the data base in key order, which means they get
dumped out of key sequence.  The LOADDB will take the records from each of
those leaf nodes and then perform inserts of those records into the new data
base.  This means we take those pages that were nearly 100% utilized because
of the efficient b-tree split algorithm and convert them into 50% utilized
pages because of having to use the generic b-tree page split algorithm.

We do not "compress" records in the data base.  The data in the data base is
encoded to reduce space requirements.  The data will always be written in the
encoded form to the data base as it is required for us to properly interpret
the data in the data base pages.  This encoding is performed with any writes
of records into the ADSM data base, including the LOADDB since it calls the
same routines to perform the writes into the data base as the rest of the
server functions.

APAR IC13101 also describes this.