ADSM-L

Re: AUDITDB Hung?

1997-01-07 08:52:26
Subject: Re: AUDITDB Hung?
From: Helmut Richter <Helmut.Richter AT LRZ-MUENCHEN DOT DE>
Date: Tue, 7 Jan 1997 14:52:26 +0100
On Fri, 3 Jan 1997, Bob Booth - CSO wrote:

> I have been running an audit of the database on my ADSM server for 4+ days
> now, and I am beginning to get a little worried.
>
> [...]

> The last 'Processed' message was sometime yesterday.  I know from the dump
> and load that I have 118000000 entries.  The dsmserv processes are still using
> all the CPU that they can.  Is it normal for the AUDIT to do this?  Does this
> mean that I am looking at many more days of down time?

Yes, it is normal, at least it is not a rare event. On this list, besides
you and me at least Jim O'Leary <joleary AT UIC DOT EDU> and Simon Travaglia
<SPT AT WAIKATO.AC DOT NZ> have reported such incidents. I cannot say whether 
and
how the problem was resolved in these two cases.

In autumn 1995, we had an auditdb run which lasted for five weeks before
we terminated it without success. I then wrote a PMR for which I got a
message these days that they intended to close it. I do not know whether
the PMR is in fact meanwhile closed, and I admit that I am no longer
overly interested in these formalia which just cost the customer's time.
What I would be interested in is real action from IBM's side but there I
have meanwhile reached a fair level of resignation.

> My server has now been down for 7+ days, and my paying customers are going
> elsewhere.  I must admit, that I am ready to recommend another backup product
> to my management who are now throwing darts at my head.

It is indeed debatable whether ADSM is a product that could reasonably be
used in a reliable environment as long as such calamitious incidents as
you and I have experienced meet so little attention from IBM's side. But:
have you got another product to recommend?  It would not be sufficient to
have a lot of satisfied customers (ADSM has them as well, and for good
reasons as long as nothing happens) but you would need positive evidence
that the manufacturer is capable of flexible and helpful action also in
critical cases. How could you test that without using the product in your
environment?  I do bear the same scepticism which unfortunately has turned
out to be justified in ADSM's case against any other product as well. Our
experience with UniTree in the past were not any better, and the vendor of
a third product (which worked well for a couple of workstations) did not
want to guarantee that it would also work in a bigger environment.

I am very sorry that all this must be really depressing in your situation
and I hardly know anything to cheer you up. My advice is:

Do not reckon that auditdb will ever terminate. If it does, you have had
  good luck but do not rely on it.

If you decide to abort the auditdb run you end up with a database that
  may contain a lot of errors but which may still be usable for most of the
  files contained. You could try to export as many files from it as possible
  and then insert them into a new ADSM server. This is how we fixed the
  thing. We abandoned the backup data which were uninteresting after five
  weeks and focused on the archive data which we were able to recover. When
  the whole thing works again and takes backups again, backup data have a
  tendency to get uninteresting anyway; it is the archive and HSM data which
  is usually invaluable.

For the future, use all ADSM features, such as recovery logs, that help
  keeping your database flawless. Do not, however, accept IBM's plea that
  these render the auditdb obsolete: While it is a good idea to store less
  flammable material, it is a bad idea to consider the fire brigade
  obsolete.

I can vividly imagine your situation and I do wish you all the best.

Helmut Richter

==============================================================
Dr. Helmut Richter                       Leibniz-Rechenzentrum
Tel:   +49-89-289-28785                  Barer Str. 21
Fax:   +49-89-2809460                    D-80333 Muenchen
Email: Helmut.Richter AT lrz-muenchen DOT de    Germany
==============================================================
<Prev in Thread] Current Thread [Next in Thread>