ADSM-L

[no subject]

1994-02-25 06:53:09
From: Matthias Feyerabend <matthias AT RZRI6F.GSI DOT DE>
Date: Fri, 25 Feb 1994 12:53:09 +0100
Subject:ADSM Database Repair

--------
Matthias Feyerabend GSI Darmstadt 25.2.1994
Matthias Feyerabend GSI Darmstadt 25.2.1994

Experiences with ADSM "Database salvage utility" introduced with
PN48720 in MVS Server Version 1, Release 1, Level 0.4/1.4

1. We changed from Level 0.2/1.2 to 0.4/1.4 because of minor
   problems with accounting (figures in SMF records didn't show up).
2. We apparently introduced a new Problem during expiration process:
    ANR0811I Inventory client file expiration started as process 194.
    ANR0104E AFERASE(440):
    Error 2 deleting row from table "AF.Vol.Segments"
    ANR0865E Expiration processing failed - internal server error.
    ANR0860E Expiration process 194 terminated due to internal error:
    deleted 0 backup files and 0 archive files.
3. After communication with IBM entered command
      show tblscan AF.Vol.Segments
   with the output:
   .....
   Index integrity error, row found scanning, but not via. index
   'Hidden' record found in page 81319, entry 0
   Siblings of Page are L:80921 R:62064
   (35) (ea7) (0) (3) (0) (0, dd91e) (0)
   .....
   Scanned 560000 entries
   569676 entries scanned for object AF.Vol.Segments sequentially
   - 99 errors could not be found through the index.
4. New response from IBM:
   As seen in the output, the index
   integrity error from LEVEL 0.3/1.3 is present.
   The only way to clean
   this up is the DUMP/re-initialize/LOAD/AUDIT DB.
   Also, note that no data has been lost,
   just can't index to the information about the data correctly.
5. I don't know how we got into that error, nor do I know how to
   avoid in the future, but I had to do the repair now.
6. ADSM Database at GSI is in the moment:
           85 674 Pages
        4 383 779 entries
              160 Megabytes.
   These are the numbers reported by DUMPDB, LOADDB, AUDITDB.
7. DUMPDB to tape took 13 minutes elapsed, 6 minutes CPU IBM 3090/600J.
   LOADDB from tape took 5 1/2 hours elapsed, 2 3/4 hours CPU.

   AUDITDB took 10 hours elapsed, almost 6 hours CPU.
   Altogether almost 16 hours (!!!) elapsed with over 50% CPU
   and practically no I/O (??)
8. On top of that I had two LOADDBs which failed:
   - The first with timeout
   - The second one is more serious:
   You are only allowed to do LOADDB in one single DB volume !!!
   I had four different DB volumes defined before DUMPDB,
   did a new install for same number and sizes of DB volumes,
   but couldn't do the LOADDB !!
   ANR4019E : Load processing failed - insufficient database space.
   ANR4020E LOADDB: Batch database insert failed.
   No documentation about that restriction which took me 4 more hours.
9. Conclusion:
   It took one day and one night to do the job.
   Before doing something similar think twice.
10. More questions:
   - What is the reason of the corrupted database, how to avoid it ?
   - Why that restriction of LOADDB in one single DB volume ?
     Several DB volumes are allowed, only LOADDB needs one single ??
   - Is AUDITDB sufficient to do such repairs, or is it really
     necessary to do DUMPDB, LOADDB and AUDITDB ?
   - What about general recommendations for database recovery ?
     Mirroring is nice, but in the case of logical errors nothing worth.
   - Timing problem, hours, days.
   - More information about internals.
     Like HSM information about HSM control records, what
     is happening in the database ??
   - Description of Commands missing:
     trace, show, ckpt, ..

ADSM is nice and useful tool, but Database recovery needs
more thoughts and better implementation.

Matthias Feyerabend
<Prev in Thread] Current Thread [Next in Thread>
  • [no subject], Matthias Feyerabend <=