I've seen that but generally once the log has hit 80% (of a 13 GB log) and
TSM is imposing a 2 millisecond delay in transactions in order to try to
keep the log from filling and crashing TSM.
I only have seen it a few times a year across various very active
environments but one thing that seems to be common with me (and you) is that
there is a long running and slow running client session present in
conjunction with expiration.
Notice your SERVERGRAPH client... about 54K sessions old and only 63 MB sent
& 150 received.
Not sure how many sessions a day you have but the pattern I've seen is that
if a slow running client has been running across a couple of expiration
processes, something will get pinned (but not a deadlock condition) and
eventually the server will have to be halted to clear the situation.
I haven't bothered to call in a problem because I don't have the time nor
the environments available to try to make this recreatable on demand.
I personally think that the long running client blocks the commit from
expiration and ~if~ it had a long enough time to run it would eventually
release its pin of the log but in the systems I work with, we can't
exist/wait that long.
I'll also point out that when we have caught this in its early stage and run
a DB backup... it would complete and have no impact on clearing the log.
I've seen this when running both log mode normal and roll forward.
(Also if caught early enough, I've been able to cancel the client
successfully.)
Dwight
-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
Fred Johanson
Sent: Monday, November 23, 2009 2:27 PM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: [ADSM-L] Log pinned
A different problem than the one a few weeks gone by.
DB backup finished, leaving log usage at 0.1 % used. No sessions running,
so I started an expiration process. After 30 minutes, this is what I saw:
tsm: GLASSHOUSE>q pr
Process Process Description Status
Number
-------- --------------------
-------------------------------------------------
873 Backup Storage Pool Primary Pool SERVDISKPOOL, Copy Pool
OFFSITEPOOL,
Files Backed Up: 43179, Bytes Backed
Up:
145,892,524,032, Unreadable Files: 0,
Unreadable
Bytes: 0. Current Physical File
(bytes):
2,147,143,680 Current output volume:
C31199.
874 Backup Storage Pool Primary Pool SERVDISKPOOL, Copy Pool
OFFSITEPOOL,
Files Backed Up: 12475, Bytes Backed
Up:
137,712,177,152, Unreadable Files: 0,
Unreadable
Bytes: 0. Current Physical File
(bytes):
1,384,419,328 Current output volume:
C32082.
875 Backup Storage Pool Primary Pool SERVDISKPOOL, Copy Pool
OFFSITEPOOL,
Files Backed Up: 48409, Bytes Backed
Up:
160,964,038,656, Unreadable Files: 0,
Unreadable
Bytes: 0. Current Physical File
(bytes):
3,980,283,904 Current output volume:
C32083.
878 Expiration Examined 207408 objects, deleting
207376 backup
objects, 0 archive objects, 0 DB
backup volumes,
0 recovery plan files; 0 errors
encountered.
tsm: GLASSHOUSE>q sess
Sess Comm. Sess Wait Bytes Bytes Sess
Platform Client Name
Number Method State Time Sent Recvd Type
------- ------ ------ ------ ------- ------- -----
-------- -------------------
388,208 Tcp/Ip Run 0 S 63.7 M 150 Admin
Linux86 SERVERGRAPH
442,970 Tcp/Ip Run 0 S 134.3 K 2.2 K Admin
AIX FRED
443,069 Tcp/Ip IdleW 8.9 M 38.4 K 1.2 K Admin
AIX AILIAS
445,050 Tcp/Ip IdleW 15 S 4.3 K 405 Node
SUN SOL- JON
ARIS
tsm: GLASSHOUSE>q log
Available Assigned Maximum Maximum Page Total
Used Pct Max.
Space Capacity Extension Reduction Size Usable
Pages Util Pct
(MB) (MB) (MB) (MB) (bytes) Pages
Util
--------- -------- --------- --------- ------- ---------
--------- ----- -----
12,788 12,788 0 6,996 4,096 3,273,216
1,482,016 45.3 92.1
I cancelled the expiration and the runaway log stopped running.
My thinking is that the expiration is feverishly writing its transaction to
the log. Since there are no other sessions accessing the log, there are no
interrupts to trigger a commit. So the log keeps filling, triggering and
retriggering DB backups. If my thinking is correct, this is a design
feature in the pre-DB2 b-tree DB, not easily remedied. Just something to
constantly monitor.
Anybody have any thoughts on this?
Fred Johanson
TSM Administrator
University of Chicago
773-702-8464
|