ADSM-L

Re: CLEANUP EXPTABLE / SHOW VERIFYEXPTABLE

2006-04-06 18:33:48
Subject: Re: CLEANUP EXPTABLE / SHOW VERIFYEXPTABLE
From: Josh-Daniel Davis <xaminmo AT OMNITECH DOT NET>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Thu, 6 Apr 2006 17:27:23 -0500
OK, I finally figured it out.
It's processing by NODES.REG_TIME, then by FSID.
I have 75 of 479 nodes left.
I'll look in my occupancy extracts to see how much more there is.


On 06.04.06 at 16:55 xaminmo AT OMNITECH DOT NET wrote:

Date: Thu, 6 Apr 2006 16:55:50 -0500
From: Josh-Daniel Davis <xaminmo AT OMNITECH DOT NET>
Reply-To: "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: CLEANUP EXPTABLE / SHOW VERIFYEXPTABLE

Turns out, I can't wait for SHOW TREE.
I gave it an hour, but it held a heavy lock (I didn't pull SHOW LOCK)
and it prevented some simple Q commands, specifically, Q FI.

I thought maybe the SHOW IMV output might help, but I don't know what HWM
means:
          Last Object Id : 0 580917567
           HWM Object Id : 0 580918273
HWM Compression Object Id : 0 577239041
...


I guess it's feasible that we have 13 million objects;
however, the actlog doesn't say anything about the OK objects, just the
fixed ones.

So, again, any clue as to how to find out how long to expect this to run
would be helpful.

-Josh

On 06.04.06 at 15:03 xaminmo AT OMNITECH DOT NET wrote:

Date: Thu, 6 Apr 2006 15:03:01 -0500
From: Josh-Daniel Davis <xaminmo AT OMNITECH DOT NET>
Reply-To: "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>
To: ADSM-L AT VM.MARIST DOT EDU
Subject: CLEANUP EXPTABLE / SHOW VERIFYEXPTABLE

Does anyone know how to tell how big the expiration table is?

The reason is that I ran CLEANUP EXPTABLE on Monday.
On one of my servers, it finished up almost immediately.
On the other, it's been running for almost 3 days.


Because of this, when I try to run EXPIRE INV, I get:

tsm: SERVER>expire inv
ANS8001I Return code 4.


tsm: SERVER>q act begint=-00:01
04/06/06 14:43:34 ANR2017I Administrator OPERATOR issued command: EXPIRE
  INVENTORY (SESSION: 239372)
04/06/06 14:43:34 ANR4298I Expiration thread already processing - unable
  to begin another expiration process. (SESSION: 239372)
04/06/06 14:43:34 ANR2017I Administrator OPERATOR issued command: ROLLBACK
  (SESSION: 239372)


It doesn't show up in Q PROC, and tracing IM* and more only shows failure
to
obtain the lock.


I know it's running because of SHOW THREAD and Q ACT.


SHOW THREAD will shows this:

Thread 129: ImVerifyExpTabThread
tid=33076, ktid=2588793, ptid=0, det=1, zomb=0, join=0, result=0, sess=0
 Awaiting cond waitP->waiting (0x18d5ffe20), using mutex TMV->mutex
(0x111b091f8), in tmLock (0x100041a08)
 Stack trace:
   0x0900000000382554 _cond_wait_global
   0x0900000000382f64 _cond_wait
   0x0900000000383a2c pthread_cond_wait
   0x000000010000d91c pkWaitCondition
   0x0000000100041a10 tmLock
   0x000000010016cc5c ImLockFsId
   0x000000010016cafc ImLockFileSpace
   0x000000010067f168 LockFilespace
   0x0000000100681710 ImVerifyExpTabThread
   0x000000010000e9dc StartThread
   0x090000000036c50c _pthread_body



Q ACT has been showing many many of these message pairs:

04/06/06 13:19:04 CLEANUP EXPTABLE:  **** resetting 'hasactive',
objId=0:574058714 **** (SESSION: 134418)

04/06/06 13:19:04 CLEANUP EXPTABLE:  !!!! 'HasActive' flag set incorrectly
for objId=0:574877220 (\ADSM.SYS\CLUSTERDB), nodeName=NODENAME,
fsName=\\nodename\c$ !!!! (SESSION: 134418)

There are 125000 lines of this in the actlog in the last 67 hours.

The object IDs are not in order, so I have no idea how much longer it's
going
to run.

I'm assuming it will parse the entire Expiring.Objects table.


SHOW OBJDIR
... Expiring.Objects(78)


SHOW NODE 78
It's a b-tree root node with 99 subnodes.
It's 4 levels deep, and each node has a different number of children.
MaxCapacity is 1004, so potentially 1004^4.
I manually traversing the tree isn't feasible.


SHOW TREE Expiring.Objects
This just hangs for a long time.


I'm leaving it running, redirected to an outfile, but it's been 10 mins for
both the node that completed the CLEANUP quickly and the node that didn't.

When expiration was OK, it would take 6-10 hours with SKIPD=YES and up to a
day and a half with SKIPD=NO, vs 4 hours on the "good" node.

There's no CANCEL CLEANUP or similar.

I'm hesitant to kill off the server and restart it simply because of the
number of objects it's correcting.

So, should I just wait for the SHOW TREE to complete, or is there some
other,
faster and more simple way to see?


Thanks for any assistance.

-Josh-Daniel Davis


<Prev in Thread] Current Thread [Next in Thread>