ADSM-L

Re: Slow DB recovery

1994-09-22 13:21:22
Subject: Re: Slow DB recovery
From: Mitch Sako <mitch AT LSIL DOT COM>
Date: Thu, 22 Sep 1994 10:21:22 PDT
> ADSM Development does indeed understand that the current database
> dump/load and audit process is waaaay to long running to be
> practical when fast recovery of the server is necessary.
> Informational APAR II08040 describes enhancements that we recently
> introduced to produce a consistent database dump image, and thereby
> completely eliminate the need for a database audit (AUDITDB) after the
> the database is reloaded.  In fact, you could have used that method
> to eliminate the AUDITDB that you are currently running.
> The on-line DUMP DB CONSISTENT=YES command can be used to produce
> a consistent database dump so that an AUDITDB is not required
> after the server database is reloaded from the dump.
> In addition, development is actively working on solutions that
> will improve server recovery time overall.
> Mike Kaczmarski
> ADSM Server Development

Hi Mike,

I have a couple of items now.  Our databases are well over 1,000,000 pages
now.  I don't even want to think of how long an audit would take now.
Is there a status on the backgrounding of audits yet?  I know you mentioned
that this was being talked about (audits running while the server is up).

Also, when we do initial dumps it often overwhelms the server and he is
not able to migrate from disk to tape fast enough (if there are many
small objects).  I would like to see the following priorities set:

  RESTORE:    the highest priority
  MIGRATION:  second only to restore
  BACKUP:     most backups are batch jobs so this can be lower
  EXPIRATION: the lowest priority
  DEL DATA:   same as expiration
  RECLAIM:    same as expiration and del data

We are expecting some 9GB filesystems in the near future and it's
scary to think about the first backup and the first full restore!

One server we have is displaying the following characteristics:

VM server, RS/6000 client (model 30), 165 nfs mounted filesystems
from IBM Raven and Auspex  (approximately 400GB, 2 million objects
under management, with some exclusions)

adsm> q db

Available Assigned   Maximum   Maximum    Page     Total      Used %Util  Max.
    Space Capacity Extension Reduction    Size    Usable     Pages       %Util
     (MB)     (MB)      (MB)      (MB) (bytes)     Pages
--------- -------- --------- --------- ------- --------- --------- ----- -----
    7,020    7,020         0     2,612   4,096 1,797,120 1,127,982  62.8  62.8
    7,020    7,020         0     2,612   4,096 1,797,120 1,127,982  62.8  62.8


currently (this minute):

adsm> q stg

Storage      Device       Estimated  %Util  %Migr  High   Low  Next
Pool Name    Class Name    Capacity                Mig%  Mig%  Storage
                               (MB)                            Pool
-----------  ----------  ----------  -----  -----  ----  ----  -----------
BACKUPPOOL   DISK          29,627.5   82.5   82.3     5     0  BACKUPTAPE
BACKUPPOOL   DISK          29,627.5   82.5   82.3     5     0  BACKUPTAPE
BACKUPTAPE   CARTRIDGE    596,767.5   64.8   93.6   100     0

(notice we have 29GB of disk at 82.5% capacity!)

adsm> q proc

 Process Process Description  Status
  Number
-------- -------------------- -------------------------------------------------
       2 Migration            Disk Storage Pool BACKUPPOOL, Moved Files: 31989,
       2 Migration            Disk Storage Pool BACKUPPOOL, Moved Files: 31989,
                               Moved Bytes: 887,197,696, Unreadable Files: 0,
                               Unreadable Bytes: 0. Current File (bytes): None

                               Current output volume: DS1467.

       3 Expiration           ANR0817I Currently processing filespace
                               /home/aix/p134 for Node DCRST2: have deleted a
                               total of 35784 backup files, and 0 archive
                               files.


I restarted the server about 18 hours ago (due to another problem) and you
can see that it moved less than a 1 GB and about 32,000 objects so far.

adsm> q node

Node Name                 Platform Policy Domain  Days Since Days Since Locked?
                                   Name                 Last   Password
                                                      Access        Set
------------------------- -------- -------------- ---------- ---------- -------
DCRST2                    AIX      STANDARD               <1        224   No
DCRST2                    AIX      STANDARD               <1        224   No

This might appear strange but we only have one node on this server but it
backs up five large fileservers

adsm> q ses

  Sess Comm.  Sess     Wait   Bytes   Bytes Sess  Platform Client Name
Number Method State    Time    Sent   Recvd Type
------ ------ ------ ------ ------- ------- ----- -------- --------------------
    73 Tcp/Ip Run      0 S    4.6 M 419.9 M Node  AIX      DCRST2
    73 Tcp/Ip Run      0 S    4.6 M 419.9 M Node  AIX      DCRST2
   100 Tcp/Ip RecvW    6 S  344.8 K 349.9 M Node  AIX      DCRST2
   151 Tcp/Ip Run      0 S    2.3 K     228 Admin SunOS    MES00
   152 Tcp/Ip Run      0 S  234.5 K 192.5 K Node  AIX      DCRST2

Currently we have only three sessions running but it can get up to 20 or more
at times, depending on the amount of data changed.  I am continuously
firing off backup jobs every 8 minutes (I am tuning that interval now
but 8 minutes fits all filesystems into a 24 hour window).

This is an extreme case but it illustrates the pressure that our server
is under.  We are using a Storage Technology Silo and operation is
totally lights-out, however a prioritized migration would help us
greatly.  Thanks, Mike.  Good to hear from you again.

Mitch
________________________________________________________________________
| Mitch Sako          \\\\   \\\\ \\\\\\\\  ||||||||  /////// //    // |
| I-NET Corporation    \\ \\ \\ \\    \\       ||    //      //    //  |
| LSI Logic Contract    \\  \\\  \\    \\      ||   //      ////////   |
| Mailstop E-195         \\       \\    \\     ||  //      //    //    |
| 1501 McCarthy Blvd.     \\       \\ \\\\\\\\ || /////// //    //     |
| Milpitas  CA  95035            ___o                                  |
| local:      mitch@asic       _'\ <_     Phone:  (408) 433-4187       |
| internet:   msako AT lsil DOT com  (_)/ (_)    FAX:    (408) 433-8796       |
| personal:   msako AT netcom DOT com            Pager:  (408) 989-3365       |
| ibmmail:    USMILUN9 AT IBMMAIL                                      |
| DISCLAIMER: Opinions expressed are mine alone                        |
|______________________________________________________________________|
<Prev in Thread] Current Thread [Next in Thread>