ADSM-L

Re: [ADSM-L] Log pinned

2009-11-24 02:14:48
Subject: Re: [ADSM-L] Log pinned
From: Roger Deschner <rogerd AT UIC DOT EDU>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Tue, 24 Nov 2009 00:39:45 -0600
I see this fairly often. It's always a slow-running client who is
backing up a large file. I almost had a Log-full disaster last Sunday
when DB backup started with a slow client already backing up. I
eventually had to DISABLE SESSIONS and cancel all client backups until
the DB backup finished.

The only answer here is to enable throughput monitoring so that slow
clients get cancelled automatically. (I actually have a cron process
that if the log gets 70% full, it parses the output of SHOW LOGPIN and
cancels that client session.) You cannot tolerate slow clients.

We seem to be having a lot more slow clients lately, and one recurring
theme in many of my log full problems is certain faulty versions of
Symantec Endpoint Protection running on client systems. I'm still trying
to pin down the relationship between SEP and slow TSM backups. I'll post
it here if I can narrow it down to a particular release of SEP.

Roger Deschner      University of Illinois at Chicago     rogerd AT uic DOT edu
               Academic Computing & Communications Center
==="If a train station is where a train stops, what's a work station?"==


On Mon, 23 Nov 2009, Dwight Cook wrote:

>I've seen that but generally once the log has hit 80% (of a 13 GB log) and
>TSM is imposing a 2 millisecond delay in transactions in order to try to
>keep the log from filling and crashing TSM.
>I only have seen it a few times a year across various very active
>environments but one thing that seems to be common with me (and you) is that
>there is a long running and slow running client session present in
>conjunction with expiration.
>Notice your SERVERGRAPH client... about 54K sessions old and only 63 MB sent
>& 150 received.
>Not sure how many sessions a day you have but the pattern I've seen is that
>if a slow running client has been running across a couple of expiration
>processes, something will get pinned (but not a deadlock condition) and
>eventually the server will have to be halted to clear the situation.
>I haven't bothered to call in a problem because I don't have the time nor
>the environments available to try to make this recreatable on demand.
>
>I personally think that the long running client blocks the commit from
>expiration and ~if~ it had a long enough time to run it would eventually
>release its pin of the log but in the systems I work with, we can't
>exist/wait that long.
>
>I'll also point out that when we have caught this in its early stage and run
>a DB backup... it would complete and have no impact on clearing the log.
>I've seen this when running both log mode normal and roll forward.
>(Also if caught early enough, I've been able to cancel the client
>successfully.)
>
>Dwight
>
>
>
>
>-----Original Message-----
>From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
>Fred Johanson
>Sent: Monday, November 23, 2009 2:27 PM
>To: ADSM-L AT VM.MARIST DOT EDU
>Subject: [ADSM-L] Log pinned
>
>A different problem than the one a few weeks gone by.
>
>DB backup finished, leaving log usage at 0.1 % used.  No sessions running,
>so I started an expiration process.  After 30 minutes, this is what I saw:
>
>tsm: GLASSHOUSE>q pr
>
> Process     Process Description      Status
>  Number
>--------     --------------------
>-------------------------------------------------
>     873     Backup Storage Pool      Primary Pool SERVDISKPOOL, Copy Pool
>OFFSITEPOOL,
>                                       Files Backed Up: 43179, Bytes Backed
>Up:
>                                       145,892,524,032, Unreadable Files: 0,
>Unreadable
>                                       Bytes: 0. Current Physical File
>(bytes):
>                                       2,147,143,680 Current output volume:
>C31199.
>     874     Backup Storage Pool      Primary Pool SERVDISKPOOL, Copy Pool
>OFFSITEPOOL,
>                                       Files Backed Up: 12475, Bytes Backed
>Up:
>                                       137,712,177,152, Unreadable Files: 0,
>Unreadable
>                                       Bytes: 0. Current Physical File
>(bytes):
>                                       1,384,419,328 Current output volume:
>C32082.
>     875     Backup Storage Pool      Primary Pool SERVDISKPOOL, Copy Pool
>OFFSITEPOOL,
>                                       Files Backed Up: 48409, Bytes Backed
>Up:
>                                       160,964,038,656, Unreadable Files: 0,
>Unreadable
>                                       Bytes: 0. Current Physical File
>(bytes):
>                                       3,980,283,904 Current output volume:
>C32083.
>     878     Expiration               Examined 207408 objects, deleting
>207376 backup
>                                       objects, 0 archive objects, 0 DB
>backup volumes,
>                                       0 recovery plan files; 0 errors
>encountered.
>
>tsm: GLASSHOUSE>q sess
>
>   Sess     Comm.      Sess         Wait       Bytes       Bytes     Sess
>Platform     Client Name
> Number     Method     State        Time        Sent       Recvd     Type
>-------     ------     ------     ------     -------     -------     -----
>--------     -------------------
>388,208     Tcp/Ip     Run          0 S       63.7 M         150     Admin
>Linux86      SERVERGRAPH
>442,970     Tcp/Ip     Run          0 S      134.3 K       2.2 K     Admin
>AIX          FRED
>443,069     Tcp/Ip     IdleW      8.9 M       38.4 K       1.2 K     Admin
>AIX          AILIAS
>445,050     Tcp/Ip     IdleW       15 S        4.3 K         405     Node
>SUN SOL-     JON
>
>ARIS
>
>tsm: GLASSHOUSE>q log
>
>Available     Assigned       Maximum       Maximum        Page         Total
>Used       Pct      Max.
>    Space     Capacity     Extension     Reduction        Size        Usable
>Pages      Util       Pct
>     (MB)         (MB)          (MB)          (MB)     (bytes)         Pages
>Util
>---------     --------     ---------     ---------     -------     ---------
>---------     -----     -----
>   12,788       12,788             0         6,996       4,096     3,273,216
>1,482,016      45.3      92.1
>
>
>I cancelled the expiration and the runaway log stopped running.
>
>My thinking is that the expiration is feverishly writing its transaction to
>the log.  Since there are no other sessions accessing the log, there are no
>interrupts to trigger a commit.  So the log keeps filling, triggering and
>retriggering DB backups.  If my thinking is correct, this is a design
>feature in the pre-DB2 b-tree DB, not easily remedied.  Just something to
>constantly monitor.
>
>Anybody have any thoughts on this?
>
>
>Fred Johanson
>TSM Administrator
>University of Chicago
>
>773-702-8464
>

<Prev in Thread] Current Thread [Next in Thread>