ADSM-L

Re: [ADSM-L] Internal error LOGREAD388

2008-02-20 16:49:16
Subject: Re: [ADSM-L] Internal error LOGREAD388
From: Michael Bartl <michael.bartl AT SPACE DOT NET>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Wed, 20 Feb 2008 22:48:43 +0100
Thomas,
really looks quite similar to the DBLOG666 error we experienced a few
days ago.
In both cases the logfile seems (at least partially) corrupted.

IBM services suggested a point in time restore of the DB. This leads
to data loss for the time between the last dbbackup and the
powerfailure.
And they offered a different solution, smart but time consuming:

As the DB could be fine and only the LOG is corrupted (it's LOGread388
and not DBreadXXX), you could get around with
DUMPDB->FORMATLOAD->LOADDB->AUDITDB

This way you can avoid data loss (especially all data migrated from
diskpools to sequential pools would get lost), the only price you have
to pay is time.
Your DB and Logsize is very close to the size we have. Our DB was 80%
full, that is 360.000.000 DB entries.

DUMPDB is fast, you can wait at the console for completion (change
devconfig to make it possible all fits in 1 or 2 files).
FORMATLOAD is around the same duration like DUMPDB, provide around 25%
more space than you had before (just to avoid an unpleasant surprise)
LOADDB takes time, in our environment it took more than 10 hrs. Upon
completion you see the number of DB entries.
AUDITDB is slow, too. You can estimate how long it will take, as the
process lists how many entries are processed. From LOADDB you know how
many more there are to come.

I'm not very happy with this behaviour of the TSM DB. You have a DB
that ist fine, you have a LOG with one error, you loose your DB, that
is inacceptable. One of the purposes of the combination of DB and LOG
ist to get TSM crash-resistant - not to get multiple points of failure.

Good luck with your server repair!

Best wishes,
Michael Bartl

Am 19.02.2008 um 19:01 schrieb Thomas Denier:

Our data center fire suppression system was inspected earlier this
morning. The inspector somehow managed to trigger the fire suppression
system. He was able to abort the activation before Halon was
discharged,
but not before the electrical power to the data center was cut off.
We have gotten the TSM server host (zSeries Linux) back up, but we
have
not been able to bring up the TSM server (5.3.4.0). It fails during
initialization with the following messages:

ANR0200I Recovery log assigned capacity is 10800 megabytes.
ANR0201I Database assigned capacity is 66800 megabytes.
ANR0306I Recovery log volume mount in progress.
ANR0353I Recovery log analysis pass in progress.
ANR9999D pkthread.c(570): ThreadId<0> Run-time assertion failed:
"Cmp64(
scanLsn, LOGV->headLsn ) != GREATERTHAN", Thread 0, File logread.c,
Line
398.
ANR7824S Server operation terminated.
ANR7823S Internal error LOGREAD388 detected.

I already have a Severity 1 call in to IBM. We have mirrored recovery
log volumes. I tried renaming the primary volumes to force the server
to use the copies. The server failed with the same messages. I have
since renamed the primary log volumes back to their original names.
Even so, the TSM server now generates messages like the following:

ANR0215W Recovery log volume /tsmlog01/logvol is in the offline
state -
VARY ON required.

I have no idea how I am supposed to vary on the volumes when the TSM
server won't start.

Is there anything else I should try while waiting to hear from IBM?

<Prev in Thread] Current Thread [Next in Thread>