ADSM-L

When *SM crashes. What next???

2000-08-21 14:33:36
Subject: When *SM crashes. What next???
From: Joe Faracchio <brother AT SOCRATES.BERKELEY DOT EDU>
Date: Mon, 21 Aug 2000 11:33:37 -0700
Fellow *SMer's!!!  Hi!

As infrequently as it occurs, *DSM crashes.  We have seen the following
types:

      -   SQL errors
      -   Overwhelmed by GROUP BY or other SQL options.
      -   DATABASE Corruption due to system crash during DB I/O
      -   "others" too few to remember,document or care about.

In every case I have immediately restarted the system (OS and/or ADSM)
and then looked around to figure out why.

Only once did it refuse to come up.  When the DB was corrupt.

I called IBM and they gave me some OPT parm to change which in effect
tells the the system to switch to the mirror and if its not also corrupt
then use it to correct the problems in the primary copy.  This worked
without a hitch and is a good arguement for mirrored / disjoined I/O DBs.

I've now been asked to NOT bring the system back up after a crash at all
until we can determine the cause and whether it will further damage the
DB.   This will surely add 2 or more hours to the process since the last
instruction in the document will say: "Now call IBM and ask if its safe to
come back with the information you have found."

I'm uncomfortable with denying my users access for that long.
What procedures do you use?  What's your experience?

I have looked in my AIX ADSM 3.1 manuals and find lots of things about
recovering the DB.  But nothing about what to do after a 'simple' crash,
before, doing the obvious of, restarting the system.  Have you?

Please help & respond.  I've been told to research this before
going on vacation.   Sigh!

            thanks ... joe.f.


Joseph A Faracchio,  Systems Programmer, UC Berkeley
<Prev in Thread] Current Thread [Next in Thread>
  • When *SM crashes. What next???, Joe Faracchio <=