ADSM-L

ADSM crash - recovery log problem ?

2015-10-04 17:39:21
Subject: ADSM crash - recovery log problem ?
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU]On Behalf Of
To: ADSM-L AT VM.MARIST DOT EDU
I sent a message to the list last Wednesday about ADSM not restarting
after a halt to upgrade the tape library. I reported the problem to IBM but
as yet their suggestions have not been very helpful.

The dsmserv.42 process crashes and dumps immediately, just after writing
'ADSM server restart-recovery in progress' to the log. I suspect that as it
is dying so quickly the tape library update has not caused the problem -
I would have expected a more relevant error message later on in the
start up tasks. IBM have suggested that the filesystems containing the
database and recovery logs may need more space allocated. We are using
jfs based filesystems and have at least 30 MB free space in each. As I
understand it the space in the filesystem is irrelevant, as the data is
written into the files specified whose size does not ever change.
They have also suggested that our C libraries are not up to date - but we
have not had any problems with this up to now and have been running
AIX 4.3 many months.

In order to try something I have upgraded ADSM from version 3.1.2.0
to version 3.1.2.40 but it still crashes immediately on startup.
Errpt shows :

LABEL:          CORE_DUMP
IDENTIFIER:     C60BB505
Description
SOFTWARE PROGRAM ABNORMALLY TERMINATED
Probable Causes
SOFTWARE PROGRAM
User Causes
USER GENERATED SIGNAL
Detail Data
SIGNAL NUMBER
           5
USER'S PROCESS ID:
       15226
FILE SYSTEM SERIAL NUMBER
           6
INODE NUMBER
      315397
PROGRAM NAME
dsmserv.42
ADDITIONAL INFORMATION
_pthread_ 7C      ??
_spin_loc 134
pthread_c 184
pkBeginNa 104
LvmStartD 3C
CreateDis 38C
ReadDiskT 3C0
lvmInit 98C
admStartS 104
main 6D8
__start 64

Symptom Data
REPORTABLE
1
INTERNAL ERROR
0
SYMPTOM CODE
PCSS/SPI2 FLDS/dsmserv.4 SIG/5 FLDS/_pthread_ VALU/7c


I did try adding another recovery log file at IBM's suggestion - although
as this is set to normal mode ( not rollforward ) and we had 500 MB I didn't
think space would be a problem. The command :

/usr/lpp/adsmserv/bin/dsmserv extend log /usr/adsm/rcfs01/log3.dsm 160

core dumped in the same way that dsmserv did.

This makes me wonder whether perhaps there is some corruption in the
recovery log file ( or even the database ). We mirror the log and database.

This brings me ( finally!) to the crux of this message. Are there any tools
available for checking the state of the recovery log ? Secondly, can we
re-initialise the existing recovery log or can we point to a brand new
recovery log, eg. log3 above, or will this conflict with information held in
the database ?

As always, any suggestions gratefully received.

+----------------------------------------------------------------------+
 Steven Bridge     Systems Group, Information Systems, EISD
                          University College London
 email: s.bridge AT ucl.ac DOT uk                   tel: +44 (0)20 7679 2794
<Prev in Thread] Current Thread [Next in Thread>