ADSM-L

AUDIT DB and Thanks...

1997-01-08 12:45:47
Subject: AUDIT DB and Thanks...
From: Bob Booth - CSO <booth AT CHIANTI.CSO.UIUC DOT EDU>
Date: Wed, 8 Jan 1997 11:45:47 -0600
I would like to thank everyone that replied to my note about my AUDIT DB
being hung.  It did finally complete after 8 days, and my server has been
unavailable for 15 days.  I was told by the ADSM support duty manager
that the audit will take from 4 to 12 times longer than the LOADDB took,
depending on the damage, speed of CPU (yada, yada...).  I don't think
they really know.. actually.  I am sure every situation is different.

For the list and IBM's benefit, I would like to reply to and clarify a
few items.

I have had a history of problems with tape devices on my ADSM server.  I have
not found one media to be satisfactory.  I still loose 2 $40 dollar tapes a
week, and spend almost all my time moving data around.  My last go around
with IBM was to replace a SCSI controller, to see if that had any effect.
This is what I did a week before I had problems with my server.  The
new controller has not fixed the problems.

I did a normal halt of my server to replace the card, and the machine ran
for a week (2 days before Christmas) and crash one night with a TBCOL777 error
detected message (right out of the blue).  I restarted the server, and it
ran until the early morning, when it took another hit.  I then called IBM
to report the problem, and was told that an AUDIT was the only solution.

The AUDIT concerned me because it just stopped giving me any indication of
what it was doing.  For almost a day, I got no output at all.  My only
indication that it was doing something was the fact that my drive lights
were blinking and aixmon was showing activity.  I was not able to give anyone
any idea of when things would be working again.

Now for some new concerns.
1.  After I did the LOADDB,  The message at the end
tells me I have 118,546,234 entries in my database.  So, when watching the
output of the AUDITDB, I figured I could pop the cork when the
ANR4306I AUDITDB: Processed xxx records started getting close.  Well, the
audit went on to 140,000,000 records, thus blowing any chance of estimating
the completion.  Why the difference?

2.  When the server crashed, my DB was at 84.5%.  After the audit it changed to
96.7%!!!????  Why the difference?

Is the output bogus?  Why give me output if it is wrong?  Give me the
choice of having progress indicators of some kind, and have them show me
real data..  I realize that screen output can be a waste of time, but when
my boss is asking me every 20 minutes whats going on and I can't tell him
anything, he does not have much faith in me or this product.

With a product that relies to heavily on a database, I think that database
management/repair tools should be at the top of the list of developed items.
An administrator should be able to audit and repair specific areas of the
database, and be able to see problem areas when they occur, not at midnight
on Friday, weeks after the problem was created.  This makes a week worth
of database backups a waste, since I don't know when to do a database restore.

For the record.. I fully mirror the log, and database on a machine that is
dedicated to ADSM.  Now, it seems, I am at the mercy of random database
flaws, that I can't guard against, hell, I can't even tell they happened
until it is too late.

ADSM is by far the best thing going.  I have had experience with almost
all backup products and ADSM is way out in front.  IBM should put on the
breaks in new features long enough to fix some of these problems.

I am warming up my pen for new requirements in March.

thanks for listening...

Bob Booth
<Prev in Thread] Current Thread [Next in Thread>
  • AUDIT DB and Thanks..., Bob Booth - CSO <=