ADSM-L

Crash and Burn (and recovery)

1994-04-19 16:23:09
Subject: Crash and Burn (and recovery)
From: Bill Colwell <BColwell AT CCLINK.DRAPER DOT COM>
Date: Tue, 19 Apr 1994 20:23:09 GMT
I recently ran into a severe bug which prevented the server from
restarting.  It is an MVS server, but the bug is on all servers.
The MVS apar is PN54334.  What happens is that if there are more
than 16 datasets in the DISKLOG file, the server won't start, claiming
that it can't find any valid checkpoints.  'Datasets' in this case are
database files, log files, and their mirrors.  So avoid adding more
database or log files if you already have 16 entries in the DISKLOG
file until the next server build is released and installed.

I was able to restore the server with FDR full volume dumps taken
while the server was down.  I think the server must be down to get a
dump which will restore a runnable server.  The server was set back
about 36 hours.  All the users were notified, and they seemed to take
it in stride!

As a result of this restore, a test server got corrupted, so I took it
as an opportunity to test the DUMPDB, LOADDB, and AUDITDB utilities,
which are recommended by IBM for backing up the server.  The results
are not good.  The test server is less than 10% of the size of the
production server.  The test dump took 295 cpu seconds, 11 min
elapsed.  the load took 5240 cpu seconds and 4 hours elapsed.  And the
audit took 11735 cpu seconds and 6 hours elapsed.  Scaling up for
production, I estimate a total elapsed time of 1 week for the load and
audit.  Since the server must be down to do the dump, you might as
well stick with full volumes physical dumps using FDR or DF/DSS.  The
recovery is much quicker!


Bill Colwell
C. S. Draper Lab
Email: BColwell AT draper DOT com
Voice: 617-258-1550
<Prev in Thread] Current Thread [Next in Thread>
  • Crash and Burn (and recovery), Bill Colwell <=