ADSM-L

Re: Auditdb timing - FYI

2003-06-06 15:10:40
Subject: Re: Auditdb timing - FYI
From: Fred Johanson <fred AT MIDWAY.UCHICAGO DOT EDU>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Fri, 6 Jun 2003 14:08:04 -0500
Gretchen,

How's your audit situation working out?  I've got a similar
problem.  Here's a sampling of the messages I get:

 ANR9999D dfbackup.c(2044): ThreadId<79> Error 145 getting
  volume name for volId 9
 ANR9999D dsalloc.c(1979): ThreadId<57> Error 1 opening bit
  vector DSKV0000000009 for volid 9.
 ANR9999D bfcreate.c(441): ThreadId<57> Bitfile erase
  prohibited - transaction failed.
ANR9999D dfaudit.c(3816): ThreadId<0> AUDITDB: Storage volume with internal
id 9
does not exist, but bitfile 0.213167917 is stored on it - The volume cannot be
re-created.
ANR9999D dsaudit.c(3054): ThreadId<0> AUDITDB: Error 1 opening bit vector
object
DSKV0000000009.

It started about March.  I did an AUD VOL on the stgpool volumes involved
and it seemed to go away.  Last month it was back with a vengence.  I've
been working with Level 2 on one of the undocumented options of DSMSERV
AUDITDB.  The first time, it ended in a reasonable length of time, but with
a message about the LOG being in ROLLFORWARD.  That turned out to be a not
very informative way of saying the audit didn't run.  Since then I've run
it with log wet to NORMAL and it dies, even with FIX=YES.

So even if I could bring down a production machine for a week to run a full
audit, it may not work.  Now what I have is enough internals on the DB to
enable me to write a program to extract the identity of DSKV0..09 by
working my way thru a large three or four level tree.

My real question, rhetorical of course, is when is TIVOLI going to supply
us with a tool kit for identifying and working thru  DB problems?  Why do I
have to write some code to identify an object in the DB?  Why are there no
functions available to do this?

Wouldn't it be nice if V5R3 or V6R1, whichever comes first, were devoted to
the care and feeding of the TSM DB.

My $.02 anyway.


At 09:27 AM 5/27/2003 -0400, you wrote:
I've been plagued by a few problems when deleting accounts. So far,
it seems like Win2K or WinXP clients (SYSTEM OBJECTs are involved
again!) are prone to this error:

05/27/2003 09:00:56  ANR2017I Administrator XXXXXX issued command:
DELETE
FILESPACE ZZZZZZ *
05/27/2003 09:00:56  ANR0984I Process 185 for DELETE FILESPACE started
in the
BACKGROUND at 09:00:56.
05/27/2003 09:00:56  ANR0800I DELETE FILESPACE * (fsId=6) for node
ZZZZZZ started as process 185.
05/27/2003 09:00:56  ANR0802I DELETE FILESPACE * (fsId=6)
(backup/archive data)
for node ZZZZZZ started.
05/27/2003 09:00:57  ANR0104E imutil.c(7761): Error 2 deleting row from
table
"Expiring.Objects".
05/27/2003 09:00:57  ANR9999D imfsdel.c(1872): ThreadId<25> Error 19
deleting
group leader 0 176658713.

I've tried a number of things - renaming the filespace, moving the node
data and then
auditing the tape, deleting the filespace specifically - but it's
really a database
'corruption' and can only be fixed by an audit (per support).

Over the course of the last two weeks, I recovered this database to a
test server and
ran an audit. Here are the pertinent stats for your reference:

Server: H80, 4 way, 2 GB, AIX 4.3.3
TSM: v5.1.6.4
DB size: 179,544 MB - 74.4% full
Log size: 13,280 MB
Audit command: dsmserv auditdb fix=yes
Audit start: 5/19 09:05
Audit end: 5/25 19:15
Number of database entries: Processed 1050073565 database entries
(cumulative).
Elapsed time: 6 days 10 hours 10 minutes

The audit was successful and did allow me to delete the problem node.
However,
there really should be a way to go after the offending entry and blast
it (under
adult supervision, of course!). I'm not really going to be able to
justify a down
time of 7 days just to clean up an account. It's now happened again on
another
server, so I will have to do this test again to get a good estimate of
the down time
required to clean that server up.

I've pushed these accounts 'aside' by renaming them and changing the
contact
info, but the clients would really like me to remove the data (legal
reasons). Having
errors like this makes me wonder what else is going on in the database.

Gretchen Thiele
Princeton University

<Prev in Thread] Current Thread [Next in Thread>