Subject:ADSM Database Repair
--------
Matthias Feyerabend GSI Darmstadt 25.2.1994
Matthias Feyerabend GSI Darmstadt 25.2.1994
Experiences with ADSM "Database salvage utility" introduced with
PN48720 in MVS Server Version 1, Release 1, Level 0.4/1.4
1. We changed from Level 0.2/1.2 to 0.4/1.4 because of minor
problems with accounting (figures in SMF records didn't show up).
2. We apparently introduced a new Problem during expiration process:
ANR0811I Inventory client file expiration started as process 194.
ANR0104E AFERASE(440):
Error 2 deleting row from table "AF.Vol.Segments"
ANR0865E Expiration processing failed - internal server error.
ANR0860E Expiration process 194 terminated due to internal error:
deleted 0 backup files and 0 archive files.
3. After communication with IBM entered command
show tblscan AF.Vol.Segments
with the output:
.....
Index integrity error, row found scanning, but not via. index
'Hidden' record found in page 81319, entry 0
Siblings of Page are L:80921 R:62064
(35) (ea7) (0) (3) (0) (0, dd91e) (0)
.....
Scanned 560000 entries
569676 entries scanned for object AF.Vol.Segments sequentially
- 99 errors could not be found through the index.
4. New response from IBM:
As seen in the output, the index
integrity error from LEVEL 0.3/1.3 is present.
The only way to clean
this up is the DUMP/re-initialize/LOAD/AUDIT DB.
Also, note that no data has been lost,
just can't index to the information about the data correctly.
5. I don't know how we got into that error, nor do I know how to
avoid in the future, but I had to do the repair now.
6. ADSM Database at GSI is in the moment:
85 674 Pages
4 383 779 entries
160 Megabytes.
These are the numbers reported by DUMPDB, LOADDB, AUDITDB.
7. DUMPDB to tape took 13 minutes elapsed, 6 minutes CPU IBM 3090/600J.
LOADDB from tape took 5 1/2 hours elapsed, 2 3/4 hours CPU.
AUDITDB took 10 hours elapsed, almost 6 hours CPU.
Altogether almost 16 hours (!!!) elapsed with over 50% CPU
and practically no I/O (??)
8. On top of that I had two LOADDBs which failed:
- The first with timeout
- The second one is more serious:
You are only allowed to do LOADDB in one single DB volume !!!
I had four different DB volumes defined before DUMPDB,
did a new install for same number and sizes of DB volumes,
but couldn't do the LOADDB !!
ANR4019E : Load processing failed - insufficient database space.
ANR4020E LOADDB: Batch database insert failed.
No documentation about that restriction which took me 4 more hours.
9. Conclusion:
It took one day and one night to do the job.
Before doing something similar think twice.
10. More questions:
- What is the reason of the corrupted database, how to avoid it ?
- Why that restriction of LOADDB in one single DB volume ?
Several DB volumes are allowed, only LOADDB needs one single ??
- Is AUDITDB sufficient to do such repairs, or is it really
necessary to do DUMPDB, LOADDB and AUDITDB ?
- What about general recommendations for database recovery ?
Mirroring is nice, but in the case of logical errors nothing worth.
- Timing problem, hours, days.
- More information about internals.
Like HSM information about HSM control records, what
is happening in the database ??
- Description of Commands missing:
trace, show, ckpt, ..
ADSM is nice and useful tool, but Database recovery needs
more thoughts and better implementation.
Matthias Feyerabend
|