ADSM-L

Re: Someone please tell me WTF is going on!

2004-11-25 08:51:58
Subject: Re: Someone please tell me WTF is going on!
From: Richard Sims <rbs AT BU DOT EDU>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Thu, 25 Nov 2004 08:51:51 -0500
>We back all our nodes up into a giant primary disk storage pool. We set the
>hi and low thresholds both to zero to cause the data to be migrated off to
>our 3494 libary. I notice several errors in the activity log saying:
> ANR9999D dfmigr.c(1413): ThreadId<103> Process 889
>  detected a discrepancy in the database for cluster
>  srvId=0, ck1=482 while migrating from pool BACKUPPOOL(1).
>  Running dsmserv AUDITDB DISKSTORAGE FIX=YES may resolve
>  the problem. Callchain of previous message: 0x1001634C
>  outDiagf <- 0x105E3D20 SelectCluster <- 0x105E5104
>  DfMigrationThread <- 0x10007AF0 StartThread <- 0xD004B3F0
>  _pthread_body <-  (PROCESS: 889)
>
>This caused migration process after migration process to be kicked
>geneating the error above over and over so I though I better try and run
>the "AUDITDB DISKSTORAGE FIX=YES" command.
>I did and now when the primary disk storage pool is set to a high and low
>=0, migration runs for about 5 minutes and usually only migrates 1 or 2
>files at a time although it is still well over the threshold and should
>keep running until the primary disk storage pool is empty or at least close
>to empty. This happened for awhile and was odd but at least the data was
>being migrated. Now, the primary disk storage pool is 37% full, the hi and
>low thresholds are both 0, and no migration process are running and won't
>start for whatever reason. I can't empty out the disk storage pool and it's
>slowly filling up. HELP! What the #$%^ is going on? I've looked at
>everything and can't find anything odd in any of the settings although I
>don't know how they would have changed anyways.
>(Can you hear the panic in my voice?!)

Just to preface: Few customers look up ANR9999D in the Messages manual to gain
perspective on its intent...which is to provide diagnostic information which may
help you find an APAR which has already been created to address the
circumstance, or to provide diagnostic information to TSM Support when it is a
new problem.  The content of the message is intended more to assist the TSM
Support person in handling the problem rather than directing the customer in a
course of action.  Thus, I would not infer that it is telling the customer that
he should perform an AUDITDB, but rather that, after looking at the full
picture, an AUDITDB may be a course of action.  Keep in mind the prevailing
warning that an AUDITDB may itself result in (further) damage and/or loss of
data.

I would strongly recommend contacting TSM Support for guidance in this problem,
rather than take risks with your TSM system.

Now, something went wrong somewhere to induce the problem.  It may be the result
of a TSM defect, but I would very much look into local, contributing causes...
In particular, review your TSM Activity Log and OS error log for unusual
events.  There could be disk reliability problems underlying this situation.
If you are not employing MIRRORWrite DB Sequential in your TSM system and
suffered a server crash, you may be seeing the result of a database
inconsistency.

  Richard Sims     http://people.bu.edu/rbs

<Prev in Thread] Current Thread [Next in Thread>