ADSM-L

Re: Database fragmentation formula (was Re: Online DB Reorg)

2003-10-20 15:47:19
Subject: Re: Database fragmentation formula (was Re: Online DB Reorg)
From: "Wheelock, Michael D" <Michael.Wheelock AT INTEGRIS-HEALTH DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Mon, 20 Oct 2003 14:44:48 -0500
Hi,

First off, let me say that this a wonderfully explained set of formulas and
analysis.

Isn't this getting a little off the mark though?  Last I checked, almost
every database on the planet (yes even pervasive sql) when allocating
pages/extents, left an amount of space unutilized at the end.  In fact, if
you do a "reorg" in SQL server, it specifically asks how much space you want
to remain free in each page.  Now why would you want that?  So that when you
add a row to a table with a clustered index (ie. A primary key, where the
table is physically ordered the same as the index) the database does not
have to add an extent at the end of the space to house the new row.  This
cuts down on logical fragmentation which is a far larger killer of databases
than the fragmentation that these formulas show.  By these formuls, every
signle one of my SQL database is 25% fragmented (why, because every Sunday
they do online reorgs to fix their logical fragmentation).  Logical
fragmentation turns large sequential reads into large random reads.

These are principles from Oracle and SQL server and may not apply to the TSM
database, but as a relational database, I don't know why they wouldn't.

Oh, and I don't know of a utility that can give you the info that I was
talking about for TSM, but the equivalent can be obtained in SQL by doing a
(dbcc showcontig (tablename)).

My $0.02

Michael Wheelock
Integris Health

-----Original Message-----
From: Zlatko Krastev [mailto:acit AT ATTGLOBAL DOT NET]
Sent: Monday, October 20, 2003 2:12 PM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Database fragmentation formula (was Re: Online DB Reorg)


***** ATTENTION *****
Those who do not like the mathematics, skip to the end (search for word
"select").
*********************

>From my mathematical background this query is not showing very good results.
The formula looks like

          ( MAX_REDUCTION_MB * 256 )
100 -  ---------------------------------
        (USABLE_PAGES - USED_PAGES) * 100

Thus closer we are to fully used database, less accurate the formula would
be. Moreover if we fill the DB at 100% the result will be division by 0
while the database might be both fragmented and not fragmented. One of our
goals is to fully utilize our resources. In that exact moment the query
would be useless. Also the formula is of no use if there is no legend how to
interpret the numbers. For example our test server DB is giving
PERCENT_FRAG=26.92 while being nearly unfragmented.

So I would dare to recommend another formula (in pages):

           used - needed
fragm_p = --------------- x 100
               used

i.e. what space is wasted from all used (in %).
The needed space (in pages) can be found if we multiply PCT_UTILIZED by
USABLE_PAGES and divide by 100 to remove percentages:

needed = PCT_UTILIZED x USABLE_PAGES / 100

while the used space (in pages) is readable from USED_PAGES column.
Therefore my final formula would be (the lines may be split by mailers):

           USED_PAGES - PCT_UTILIZED x USABLE_PAGES / 100 fragm_p =
------------------------------------------------ x 100
                            USED_PAGES

and the final query would be:
select cast(100 * (USED_PAGES - PCT_UTILIZED * USABLE_PAGES / 100) -
  / USED_PAGES as decimal(9,5)) as "Unused page parts [pages]" from db

Now the percentage shows the percentage of wasted space vs. used space. 0%
would mean database is fully populated with no holes, 100% are impossible
(as completely empty pages would not be counted, and 99+% mean each page is
filled with something small just to allocate it.

PART 2.
Beyond how much space is wasted inside pages we would be also interested in
how many empty pages we are losing due to partition-allocation scheme. Again
the math first. Same formula (but now in MB):

           used - needed
fragm_p = --------------- x 100
               used

Now needed space is derived from CAPACITY_MB field:

needed = PCT_UTILIZED x CAPACITY_MB / 100

while actual usage is the size to which we can reduce the DB:

used = CAPACITY_MB - MAX_REDUCTION_MB

the resulting formula would look like  (the lines may be split by
mailers):

           (CAPACITY_MB - MAX_REDUCTION_MB) - PCT_UTILIZED x CAPACITY_MB /
100 fragm_p =
--------------------------------------------------------------------- x 100
                            CAPACITY_MB - MAX_REDUCTION_MB

Division by zero cannot happen as TSM server does not allow us to reduce the
DB under one partition. Now the query for this percentage would be: select
cast(((CAPACITY_MB - MAX_REDUCTION_MB) -
   - (PCT_UTILIZED * CAPACITY_MB / 100) ) -
   / (CAPACITY_MB - MAX_REDUCTION_MB) * 100 -
   as decimal(9,5)) as "Allocation waste [%]" from db

And the final big-big query would look like:
select cast(USED_PAGES - PCT_UTILIZED * USABLE_PAGES / 100 -
   as decimal (20,3)) as "Unused page parts [pages]", -
   cast(100 * (USED_PAGES - PCT_UTILIZED * USABLE_PAGES / 100) -
   / USED_PAGES as decimal(9,5)) as "Page fragmentation [%]", -
   cast( (CAPACITY_MB - MAX_REDUCTION_MB) -
   - (PCT_UTILIZED * CAPACITY_MB / 100) as decimal (10,2)) -
   as "Overallocated space [MB]", -
   cast(((CAPACITY_MB - MAX_REDUCTION_MB) -
   - (PCT_UTILIZED * CAPACITY_MB / 100) ) -
   / (CAPACITY_MB - MAX_REDUCTION_MB) * 100 -
   as decimal(9,5)) as "Allocation waste [%]" from db

If someone already invented these formulae, I would congratulate him/her.
Even if I am the first dared to do this hard work, there is no Nobel price
for mathematics :-))

Zlatko Krastev
IT Consultant






Remco Post <r.post AT SARA DOT NL>
Sent by: "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU> 20.10.2003 
13:30
Please respond to "ADSM: Dist Stor Manager"


        To:     ADSM-L AT VM.MARIST DOT EDU
        cc:
        Subject:        Re: Online DB Reorg


On Sat, 18 Oct 2003 14:35:09 -0400
"Talafous, John G." <Talafous AT TIMKEN DOT COM> wrote:


> Remco,
>   Would you be willing to share your SQL query that reports on DB
> fragmentation?
>

I was allready looking at Eric (he probably saved my thingy somewhere
usefull, I just saved it in my sent-mail folder), here it is...

select cast((100 - ( cast(MAX_REDUCTION_MB as float) * 256 ) / -
(cast(USABLE_PAGES as float) - cast(USED_PAGES as float) ) * 100) as -
decimal(4,2)) as percent_frag from db

Note that I still think this is one of the more useless queries I've ever
build...

> Thanks to all,
> John G. Talafous              IS Technical Principal
> The Timken Company            Global Software Support
> P.O. Box 6927                 Data Management
> 1835 Dueber Ave. S.W.         Phone: (330)-471-3390
> Canton, Ohio USA  44706-0927  Fax  : (330)-471-4034
> talafous AT timken DOT com           http://www.timken.com


--
Met vriendelijke groeten,

Remco Post

SARA - Reken- en Netwerkdiensten                      http://www.sara.nl
High Performance Computing  Tel. +31 20 592 8008    Fax. +31 20 668 3167

"I really didn't foresee the Internet. But then, neither did the computer
industry. Not that that tells us very much of course - the computer industry
didn't even foresee that the century was going to end." -- Douglas Adams
This e-mail may contain identifiable health information that is subject to 
protection
under state and federal law. This information is intended to be for the use of 
the
individual named above. If you are not the intended recipient, be aware that any
disclosure, copying, distribution or use of the contents of this information is 
prohibited
and may be punishable by law. If you have received this electronic transmission 
in
error, please notify us immediately by electronic mail (reply).