ADSM-L

Re: URGENT: TSM database problem

2003-12-31 14:51:11
Subject: Re: URGENT: TSM database problem
From: Mitch Sako <msako AT CADENCE DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Wed, 31 Dec 2003 11:53:14 -0800
It's not the end of the world, but sometimes it feels that way...

My Linux server decide to hang a few days ago.  I tried both sides of the
mirror to no avail.

Lots of ugly stuff in the dumpdb like:

ANR4013I DUMPDB: Dumped 344244433 database entries (cumulative).
ANR0207E Page address mismatch detected on database volume /db3/db00, logical
page 10067513 (physical page 3667769); actual: 3667513.
ANR0248E Unable to read database page 10067456 from any alternate copy.

Loaded the database using loaddb:

NR4039I LOADDB: Loaded 346578745 database entries (cumulative).
ANR1365I Volume /dbbackup/72813583.dmp closed (end reached).
ANR4039I LOADDB: Loaded 346792232 database entries (cumulative).
ANR4031I LOADDB: Copied 9705211 database pages.
ANR4033I LOADDB: Copied 11333 bit vectors.
ANR4035I LOADDB: Encountered 0 bad database records.
ANR4074I LOADDB: Encountered 0 bad database entries.
ANR4036I LOADDB: Copied 346792232 database entries.
ANR4037I LOADDB: 28323 Megabytes   copied.
ANR4004I LOADDB: Database load process completed.
ANR4405I LOADDB: Loaded an inconsistent dump image - a database audit (AUDITDB)

auditdb is proceeding now and should be done sometime today.  I plan to do
an unload and load shortly thereafter and be back in production very
soon.  I don't know what caused the latest hang because it left not trace
or coredump but whenever it happens, I prepare for this little exercise and
I've got it scripted now so it's actually quite painless.  I think I
experienced my first WDSF dumpdb/loaddb/auditdb back in the early '90s
running a server on VM/SP backing up UNIX data through this little VM/TCP
box (I can't recall the name or number) to an STC robot that had tapes with
200MB capacity!  Those were painful days.  The one or two days procedure
today is a pure pleasure to partake compared to how it was back then.  Back
then I think I had something like 2 million database entries and about
500GB of data backed up compared to 346 million entries and 30TB backed up
today.

TSM has scaled nicely along with Moore's Law in the last 15+ years.  I just
wish loaddb and auditdb could be speeded up a little more, somehow.

Mitch

At 12/31/2003 10:59 AM Wednesday, you wrote:
>ANR0207E Page address mismatch detected on database volume...

Your database is corrupted.  (Was TSM mirroring in effect, as recommended
in the Admin Guide?)

You may have to restore the db, per
  http://www-1.ibm.com/support/entdocview.wss?uid=swg21155009

     Richard Sims, BU

<Prev in Thread] Current Thread [Next in Thread>