ADSM-L

ADSM Server crash

1999-05-10 05:44:19
Subject: ADSM Server crash
From: Simon Watson <Simon.S.Watson AT SHELL.COM DOT BN>
Date: Mon, 10 May 1999 17:44:19 +0800
Please find attached info related to a recent ADSM crash.

Obviously this is not a good situation.  To accomplish the restore a
PIT restore was done, mistakenly believing that this would recover to a
specific PIT using the information in the Recovery logs (wrong!).  We
now realise this is not the case, and a normal Restore DB should have
been done which would have used the Recovery logs.  It remains to be
seen whether this would have actually worked, as the problem seemed to
be in reading the recovery logs themselves.  Could anybody suggest any
other means of recovering the system without loss of data, in addition
to trying to determine why the problem occurred!

Our environment is setup with full Roll Forward Recovery Logs.  In
addition it has 3 mirrored copies of the DB and 2 Mirrored copies of
the Recovery Logs.  It also writes sequentially to both DB and Recovery
Log.  You can't get much better data protection than this!

We are on AIX 4.2.1 and ADSM 3.1.1.5

| Yesterday, Our ADSM server crashed and we were unable to get it running.
|
| The error message from dsmserv.err at the time of failure:
| 05/08/1999 02:44:51  ANR7837S Internal error DBTXN077 detected.
|
| After subsequent attempts to start it, we received the following
error message:
| ANR7837S Internal error LOGREAD566 detected.
|
| We tried several things including extending the recovery logs (using
| DSMSERV EXTEND LOG) which all failed. In the end we did a restore db
| (using DSMSERV RESTORE DB) to get the system up and running.
|
| 2 questions:
| 1. Why did the server crash in the first place? We are unable to get
| more information on the error messages above.
| 2. Is there something else that we could have done to get the system
| back up ? (restoring the database seems a bit too drastic).
|
| Shafiee.
|
| note: I have attached the system error report from the time of the
| crash in-case it might be of use to anyone looking into this problem.
|
| The system error report (obtained by running errpt -a) is as follows:
| =====================================================
| LABEL:          CORE_DUMP
| IDENTIFIER:     C60BB505
|
| Date/Time:       Sat May  8 02:45:15
| Sequence Number: 76068
| Machine Id:      00018404A400
| Node Id:         bspibm116
| Class:           S
| Type:            PERM
| Resource Name:   SYSPROC
|
| Description
| SOFTWARE PROGRAM ABNORMALLY TERMINATED
|
| Probable Causes
| SOFTWARE PROGRAM
|
| User Causes
| USER GENERATED SIGNAL
|
|         Recommended Actions
|         CORRECT THEN RETRY
|
| Failure Causes
| SOFTWARE PROGRAM
|
|         Recommended Actions
|         RERUN THE APPLICATION PROGRAM
|         IF PROBLEM PERSISTS THEN DO THE FOLLOWING
|         CONTACT APPROPRIATE SERVICE REPRESENTATIVE
|
| Detail Data
| SIGNAL NUMBER
|            6
| USER'S PROCESS ID:
|        28468
| FILE SYSTEM SERIAL NUMBER
|            6
| INODE NUMBER
|        86027
| PROGRAM NAME
| dsmserv.42
| ADDITIONAL INFORMATION
| pthread_k 150
| ??
| Unable to generate symptom string.
| Stack is unusable.
| =====================================================
|
<Prev in Thread] Current Thread [Next in Thread>
  • ADSM Server crash, Simon Watson <=