ADSM-L

Re: TSM 5.3.3 loaddb and audit problem

2006-05-17 10:16:03
Subject: Re: TSM 5.3.3 loaddb and audit problem
From: "Scott, Brian" <bscott AT EDS DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Wed, 17 May 2006 10:14:27 -0400
All,

My collegue found the following article from the University of Florida
which explains the unload/load concept but more importantly provides a
select statement to show how fragmented the TSM DB to determine if the
unload is really needed or not. It takes a few seconds to run but can
eliminate unnecessary hours of potential pain. 

Also, APAR IC47516 is out there for those running Celerra backups where
an Unload will not update the inventory table and will eventually fail
future Celerra backups.

TSM Database reloading
Summary
The database at the core of a TSM instance is prone to fragmentation,
increasing its' size. (as of Mar 2005) There are no online utilities
available to correct this problem. The increased size and fragmentation
are reflected in expiration time and backup speed, eventually presenting
an obstacle to normal operations. 
This document describes a detailed procedure for the current
recommendation to solve this problem: Unloading and Reloading the TSM
database. 
Evaluating the benefits of a reload 
Before you set about taking down the server for an unload and reload, it
would be wise to estimate wether the size reduction which will follow
the procedure is worth the effort. The unload and reload can take rather
a long time, so a reduction of small stature is probably not worth it. 
There is a query recommended by the TSM listserv which purports to
estimate the degree of fragmentation which your database is
experiencing. 
      SELECT CAST((100 - (CAST(MAX_REDUCTION_MB AS FLOAT) * 256 ) / -
      (CAST(USABLE_PAGES AS FLOAT) - CAST(USED_PAGES AS FLOAT) ) * 100)
AS - 
      DECIMAL(4,2)) AS PERCENT_FRAG FROM DB
    
should generate a number by which you can estimate the amount of benefit
would accrue from your unload/reload. 
FIXME: In this paragraph I will calibrate the returns from the query and
suggest when is a good time. 
Performing the unload-reload
1.      Prepare your environment for recovery 
2.      You're will essentially destroy your TSM database as you perform
the unload. You would be well advised to make preparations for a smooth
disaster-recovery before you begin. You should, at least: 
*       identify the device class to which you intend to unload the DB.
In this example I am going to call it DBUNLOAD. 
*       Ensure that the device class in question has capacity adequate
to receive the unload. If you have enough space to sustain your total DB
volume, plus 10-20 percent, you should be fine. You expect, of course,
that the unload will be substantially smaller than the live DB. 
*       Backup your VOLUMEHISTORY and DEVCONFIG 
*       perform a database backup, full or incremental. 
*       Locate and read the TSM documentation on DSMSERV LOADFORMAT,
DSMSERV AUDITDB, DSMSERV UNLOADDB and DSMSERV LOADDB . There is a
reference to the IBM and Tivoli documentation presences on the web at
the Administrator Documentation
<http://open-systems.ufl.edu/services/NSAM/admin_docs/index.html>  page
of this site. 
*       You might wish to disable sessions in the dsmserv.opt with the
disablescheds option. This will avoid interference as you bring the
server up again, Just In Case. 
*       Double-check the characteristics of your database and server.
Are you in rollforward mode or normal? Are your volumes mirrored as you
expect? Are the volumes in the locations you expect? Do you use any
server-to-server communications? You'll need to know these things at the
end of your reload, if you are to ensure that they are all working
properly again. 
1.      Halt the server. 
2.      When you stop the TSM server for this process, you will want to
do so with the 'quiesce' parameter, which will make it possible to
perform the unload and reload without auditing the database thereafter. 
3.      Perform the unload 
4.      This will probably be the longest duration of any of your steps.
Some examples of how long it's taken others are avaialable in this list
<http://open-systems.ufl.edu/services/NSAM/maint_docs/db_un_reload.html>
of real-world experiences. During the unload process, the TSM server
takes all of the scattered data blocks, and assembles them in order. 
5.      Be sure to carefully read the documentation of the DSMSERV
UNLOADDB command in the TSM docs. I use 
6.             DSMSERV UNLOADDB devclass=DBUNLOAD  \
7.             > /var/tmp/unloaddb.log 2>1 < /dev/null & 
8.           
9.      This formulation lets you watch the log (possibly from some
location other than that from which you began the process) and removes
some concerns about (say) the machine on which your terminal resides
dying in the interim.
10.     This command ought to result in a consistent database image. No
audit ought to be necessary. 
11.     At the end of the log output of the unload process, you will see
a recap of the list of volumes used. This list will be necessary at
reload-time. 
12.     Format the DB containers. 
13.     You must prepare the DB containers to receive the load. This
process overwrites the recovery log, but you'd already blown away the
database in the unload. You did do an incremental up in step 1, right? 
14.     Be sure to carefully read the documentation of the DSMSERV
LOADFORMAT command in the TSM docs. This command will be different for
every installation. One of mine is 
15.            DSMSERV LOADFORMAT 2 /dev/rtwebctlglv01a
/dev/rtwebctlglv02a   \
16.            4 /dev/rtwebctdblv01a /dev/rtwebctdblv02a \
17.            /dev/rtwebctdblv03a /dev/rtwebctdblv04a \
18.            > /var/tmp/loadformat.log 2>1 < /dev/null & 
19.          
20.     This formulation lets you watch the log (possibly from some
location other than that from which you began the process) and removes
some concerns about (say) the machine on which your terminal resides
dying in the interim.
21.     You may wish to use an alternate log volume for this process,
one which is very small. The majority of the time taken by the
LOADFORMAT is the initialization of the log. Once your server is up and
running, you can add the production log volumes back to the log scheme,
and re-extend the log. 
22.     This format process formats ALL the database volumes supplied as
a single database. If your database is mirrored, you should not supply
both sets of volumes, only one. You'll re-mirror the database once the
process is complete. 
23.     The logformat process is fairly quick. Expect minutes, rather
than tens of minutes. 
24.     Perform the load 
25.     The load process is usually substantially shorter than the
unload. Less than half is quite common. During this process, the TSM
server feeds the well-ordered data blocks back onto your server DB
volumes. 
26.     Be sure to carefully read the documentation of the DSMSERV
LOADDB command in the TSM docs. 
27.            DSMSERV LOADDB devclass=DBUNLOAD \ 
28.            VOLumenames= vola,volb[,...] \
29.            > /var/tmp/unloaddb.log 2>1 < /dev/null & 
30.          
31.     This formulation lets you watch the log (possibly from some
location other than that from which you began the process) and removes
some concerns about (say) the machine on which your terminal resides
dying in the interim.
32.     This command ought to result in a consistent database image. No
audit ought to be necessary. 
33.     Clean up the detritus 
34.     Now, you are ready to restart the server and check that all is
well. Some things you should expect, or expect to do: 
*       Your DB will have its' assigned capacity as the complete
capacity of all available volumes. I prefer to run with somewhat less;
according to local conventions, you might want to shrink it some.
*       If your database was mirrored before, re-define the mirror
copies. If you accidentally formatted both sets of volumes, blow away
the empty ones (there should be plenty of empty ones) and redefine them
in a manner that permits the re-mirroring. 
*       If you used a temporary log volume to shorten loadformat time,
then put your production volumes in place. 
*       Do a full DB backup. You want to safeguard this new, more
organized DB state. 
*       For each of the servers with which you have set up
server-to-server communications, perform an UPDATE SERVER FORCESYNC=YES
so that the server identification token can be updated. 
*       Backup your VOLUMEHISTORY and DEVCONFIG 
*       If you disallowed sessions in your dsmserv.opt, then re-allow
them now, and halt and restart the server. 
Real-world experiences:
Platform        Disk tech       Original Size   Final size      Unload
time    Load time       Comments        
Win2K   [unknown]       100GB   50GB (50% decrease)     22 hours
8 hours (64% faster)    Inventory expiration went from 21 hours to 7    
AIX     SSA     43GB    28GB (34% decrease)     11 hours        3 hours
(72% faster)            
AIX     SSA     16GB    12GB (25% decrease)     4 hours         1 hour
(75% faster)            

Last updated: 2005-06-06T16:50:21-04:00Home
<http://open-systems.ufl.edu/index.html> Send Feedback
<mailto:open-systems-l AT lists.ufl DOT edu>  
Copyright (c) 2004, 2005 Open Systems Group,
<http://open-systems.ufl.edu/> Computing and Networking Services,
<http://www.cns.ufl.edu/> University of Florida  <http://www.ufl.edu/> .


Regards,
Brian

Brian Scott
EDS
Global Client Engineering-GM
MS 3234
4594 W Nancy Dr.
Kankakee, IL 60901
 
( Phone:+1-815-939-2684)
+ mailto:bscott AT eds DOT com


-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
Kelly Lipp
Sent: Wednesday, May 17, 2006 6:08 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: TSM 5.3.3 loaddb and audit problem

Richard,

I could not agree more on your stance regarding Dump/Load.  However, I'm
in Holland teaching a Level 2 class and have been surprised to learn
that a lot of my students perform this action as a matter of course on
their servers.  The objective is to reduce the size of aged TSM
databases.  In TSM 5.3 we have new functionality to determine if a db
reorg would reclaim a significant amount of space.  Then the Dump/load
is executed to get this space.  Do you suppose this new command is
encouraging us to do something that is high risk?  Alternatives?

I guess they've decided the risk is worth the potential gain.

I personally have not experience the problem so have not attempted this
solution.


Kelly J. Lipp
VP Manufacturing & CTO
STORServer, Inc.
485-B Elkton Drive
Colorado Springs, CO 80907
719-266-8777
lipp AT storserver DOT com

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
Richard Sims
Sent: Tuesday, May 16, 2006 6:46 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: [ADSM-L] TSM 5.3.3 loaddb and audit problem

Do not take any further actions on your own: call TSM Support and engage
them in the problem. You risk doing further damage to your database if
you continue tinkering with it, as we have and IBM have stressed in the
past.

It seems this needs to be stressed again:
DO NOT ELECTIVELY RUN UNLOADDB - LOADDB ON YOUR TSM DATABASE!!
These are *salvage* utilities. The ADSM-L archived chronicle the horror
stories of customers who have followed mis-advice and proceeded to
perform "compress" on their TSM database. If you need corroboration on
this, review the APARs on these utilities. Such software does not
receive a lot of attention from developers, who are pressed to work on
new features rather than old, lesser-used utilities like these. And
there are no long-term gains in reorganizing your TSM database: it's a
lot of risk and no real gain.

We've seen too many customers in pain because of this stuff, and I don't
want to see any more.

    Richard Sims

On May 16, 2006, at 8:09 AM, Abdulaziz Almuammar wrote:

> Dear All,
> we did unloaddb and loaddb but after the loaddb we faced a problem on 
> the backup of the nodes and it was resolved by upgrading TSM server 
> from 5.3.2 to 5.3.3.
> However, we are facing a problem on some nodes when we do restore, 
> some files could ot be restored and we got a message that those files 
> are not available on the TSM server :( although all volumes with 
> "readwrite" access status
>
>
> to make sure that the TSM db information  is synced we have to run the

> auditdb but the problem with this is it takes a long time to do it and

> it is offline proccess
>
> Is there another way to make sure that the database information  is 
> correct?
> Is audit volume command on the storagepool level ( All volumes) will 
> do the same job as auditdb? although it takes a long time but atleast 
> the TSM server is up
>
>
>
> Regards,
> Abdul

<Prev in Thread] Current Thread [Next in Thread>