ADSM-L

Re: [ADSM-L] Log pinned

2009-11-23 16:45:49
Subject: Re: [ADSM-L] Log pinned
From: Dwight Cook <cookde AT COX DOT NET>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Mon, 23 Nov 2009 15:45:34 -0600
I've seen that but generally once the log has hit 80% (of a 13 GB log) and
TSM is imposing a 2 millisecond delay in transactions in order to try to
keep the log from filling and crashing TSM.
I only have seen it a few times a year across various very active
environments but one thing that seems to be common with me (and you) is that
there is a long running and slow running client session present in
conjunction with expiration.
Notice your SERVERGRAPH client... about 54K sessions old and only 63 MB sent
& 150 received.
Not sure how many sessions a day you have but the pattern I've seen is that
if a slow running client has been running across a couple of expiration
processes, something will get pinned (but not a deadlock condition) and
eventually the server will have to be halted to clear the situation.
I haven't bothered to call in a problem because I don't have the time nor
the environments available to try to make this recreatable on demand.

I personally think that the long running client blocks the commit from
expiration and ~if~ it had a long enough time to run it would eventually
release its pin of the log but in the systems I work with, we can't
exist/wait that long.

I'll also point out that when we have caught this in its early stage and run
a DB backup... it would complete and have no impact on clearing the log.
I've seen this when running both log mode normal and roll forward.
(Also if caught early enough, I've been able to cancel the client
successfully.)

Dwight




-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
Fred Johanson
Sent: Monday, November 23, 2009 2:27 PM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: [ADSM-L] Log pinned

A different problem than the one a few weeks gone by.

DB backup finished, leaving log usage at 0.1 % used.  No sessions running,
so I started an expiration process.  After 30 minutes, this is what I saw:

tsm: GLASSHOUSE>q pr

 Process     Process Description      Status
  Number
--------     --------------------
-------------------------------------------------
     873     Backup Storage Pool      Primary Pool SERVDISKPOOL, Copy Pool
OFFSITEPOOL,
                                       Files Backed Up: 43179, Bytes Backed
Up:
                                       145,892,524,032, Unreadable Files: 0,
Unreadable
                                       Bytes: 0. Current Physical File
(bytes):
                                       2,147,143,680 Current output volume:
C31199.
     874     Backup Storage Pool      Primary Pool SERVDISKPOOL, Copy Pool
OFFSITEPOOL,
                                       Files Backed Up: 12475, Bytes Backed
Up:
                                       137,712,177,152, Unreadable Files: 0,
Unreadable
                                       Bytes: 0. Current Physical File
(bytes):
                                       1,384,419,328 Current output volume:
C32082.
     875     Backup Storage Pool      Primary Pool SERVDISKPOOL, Copy Pool
OFFSITEPOOL,
                                       Files Backed Up: 48409, Bytes Backed
Up:
                                       160,964,038,656, Unreadable Files: 0,
Unreadable
                                       Bytes: 0. Current Physical File
(bytes):
                                       3,980,283,904 Current output volume:
C32083.
     878     Expiration               Examined 207408 objects, deleting
207376 backup
                                       objects, 0 archive objects, 0 DB
backup volumes,
                                       0 recovery plan files; 0 errors
encountered.

tsm: GLASSHOUSE>q sess

   Sess     Comm.      Sess         Wait       Bytes       Bytes     Sess
Platform     Client Name
 Number     Method     State        Time        Sent       Recvd     Type
-------     ------     ------     ------     -------     -------     -----
--------     -------------------
388,208     Tcp/Ip     Run          0 S       63.7 M         150     Admin
Linux86      SERVERGRAPH
442,970     Tcp/Ip     Run          0 S      134.3 K       2.2 K     Admin
AIX          FRED
443,069     Tcp/Ip     IdleW      8.9 M       38.4 K       1.2 K     Admin
AIX          AILIAS
445,050     Tcp/Ip     IdleW       15 S        4.3 K         405     Node
SUN SOL-     JON

ARIS

tsm: GLASSHOUSE>q log

Available     Assigned       Maximum       Maximum        Page         Total
Used       Pct      Max.
    Space     Capacity     Extension     Reduction        Size        Usable
Pages      Util       Pct
     (MB)         (MB)          (MB)          (MB)     (bytes)         Pages
Util
---------     --------     ---------     ---------     -------     ---------
---------     -----     -----
   12,788       12,788             0         6,996       4,096     3,273,216
1,482,016      45.3      92.1


I cancelled the expiration and the runaway log stopped running.

My thinking is that the expiration is feverishly writing its transaction to
the log.  Since there are no other sessions accessing the log, there are no
interrupts to trigger a commit.  So the log keeps filling, triggering and
retriggering DB backups.  If my thinking is correct, this is a design
feature in the pre-DB2 b-tree DB, not easily remedied.  Just something to
constantly monitor.

Anybody have any thoughts on this?


Fred Johanson
TSM Administrator
University of Chicago

773-702-8464

<Prev in Thread] Current Thread [Next in Thread>