ADSM-L

Re: Squishing the unused space out of my ADSM database

1999-05-24 12:27:52
Subject: Re: Squishing the unused space out of my ADSM database
From: "Richard C. Dempsey" <dempsey AT KODAK DOT COM>
Date: Mon, 24 May 1999 12:27:52 -0400
It would seem that if we could take a DUMPDB file and sort it on the keys
before handing it to LOADDB, then the LOADDB process would use space more
efficiently and it would run faster.  However, since the DUMPDB file
contains lots of tables, a naive sort of the whole file could be disastrous.

I don't suppose that there's any documentation of the DUMPDB file format?
On the one hand, sorting a table on the way out of the DB would seem pretty
straightforward.  OTOH, in the circumstances for which DUMPDB/LOADDB was
designed, it might not be feasible.  Thus, a SORTDB utility makes more
sense.

Is there a strong consensus that SORTDB is an important feature to badger
IBM/Tivoli about?

Rich Dempsey

At 07:51 AM 5/24/99 -0400, you wrote:
>>I would be a bit cautious on the expected outcome of a smaller database
>>when using the dump/load DB.  There is a very strong possibility that
>>the database will end up bigger than you started with!  I can't explain
>>this in full detail but it is to do with the way records are packed into
>>DB pages when the load occurs, and space is left in pages for later
>>expansion.
>
>Here's info I saved from the adroit explanation from David Bohm...
>   Richard Sims, BU
>---------------
>ADSM DATABASE STRUCTURE AND DUMPDB/LOADDB (David Bohm, ADSM server
development):
>
>The ADSM server data base contains different objects.  Most of the objects
are
>b-tree tables.  The cause of using more space for the LOADDB than was
actually
>used in the data base that was dumped with the DUMPDB command is a result of
>the algorithm used to perform the DUMPDB/LOADDB and the characteristics of a
>b-tree object...
>
>When a record is to be inserted into a node in a b-tree and that record does
>not fit then a split occurs.  In a standard b-tree algorithm 1/2 of the data
>goes in one leaf node and the other 1/2 goes into another leaf node.  When
>this happens randomly over time you get a tree where about 50% of the data
>base is unused space.  With the V2 ADSM server we added a little more
>intelligence in the split process.  There are many tables in the ADSM server
>where a new record will always be the highest key value in that table.  If
the
>insert is the highest key value then instead of doing a 1/2 and 1/2 split we
>just add a new leaf node with that single record.  This results in closer to
>100% utilization in each of the leaf nodes in the ADSM server.
>
>This now takes us to the DUMPDB/LOADDB process.  One of the purposes of this
>process is to recover from a corrupted data base index.  What this means
is we
>ignore the index on the DUMPDB process and only dump the b-tree leaf nodes
>(plus another type of data base object called a bitvector).  These leaf nodes
>are not stored physically in the data base in key order, which means they get
>dumped out of key sequence.  The LOADDB will take the records from each of
>those leaf nodes and then perform inserts of those records into the new data
>base.  This means we take those pages that were nearly 100% utilized because
>of the efficient b-tree split algorithm and convert them into 50% utilized
>pages because of having to use the generic b-tree page split algorithm.
>
>We do not "compress" records in the data base.  The data in the data base is
>encoded to reduce space requirements.  The data will always be written in the
>encoded form to the data base as it is required for us to properly interpret
>the data in the data base pages.  This encoding is performed with any writes
>of records into the ADSM data base, including the LOADDB since it calls the
>same routines to perform the writes into the data base as the rest of the
>server functions.
>
>APAR IC13101 also describes this.
>
>

Richard C. Dempsey                 email: dempsey AT kodak DOT com
Public Online Services             pager: 716-975-3539
11th Floor, Bldg 83, RL            phone: 716-477-3457
Eastman Kodak Company              fax:   716-722-3885
Rochester, NY 14650-2203