ADSM-L

Re: [ADSM-L] Pinned Recovery Log

2008-09-30 02:29:42
Subject: Re: [ADSM-L] Pinned Recovery Log
From: Roger Deschner <rogerd AT UIC DOT EDU>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Tue, 30 Sep 2008 01:28:16 -0500
.
We run a cron process every 20 minutes that queries the log fullness,
and if it's 70% or over it does SHOW LOGPIN and cancels that session.
I've found that when a client has the log pinned, it may take quite a
while to cancel the session - sometimes as long as an hour. That's why
we cancel when the log is 70% full, instead of 99%. These incidents get
logged, and we investigate them.

When that happens I usually find that there is a networking problem, and
typically it's a half-duplex link somewhere along the line between the
TSM client and server. This problem won't be obvious with normal stuff
like web browsing and SSH/telnet sessions - these may still appear to
work OK. It is typically exposed only by a TSM backup, due to the much
greater volumes of data moved.

Traceroute cannot detect this. You need to put up an NDT (Network
Diagnostic Tool) network bandwidth tester and test this client machine
on it. Your networking people might already have one of these, or you
can use a public access NDT server. NDT can spot bad links such as ones
set to half duplex, very quickly. It can also pinpoint bad cables. More
information about NDT, including a list of public access NDT servers, is
at http://e2epi.internet2.edu/ndt/.

Make sure you are not allowing any clients to back up if they are not on
your local net. Client nodes out there on the Internet would pin the log
frequently, until we disallowed them. We did this at the router level.
ADSL is the worst, due to its smaller upload bandwidth - backup is
uploading after all. We tell people with laptops that they can only use
TSM when they bring the computer onto campus. Wi-fi links do not appear
to cause log pinning, as long as the wireless router is connected
directly to our campus network.

The other problem that can pin the log is a client backing up a very
large file, slowly. We find Macs have more problems in this area than
other types of clients, mostly due to the kinds of data people typically
process on a Mac. Video files can be enormous. Consider limiting file
size.

Roger Deschner      University of Illinois at Chicago     rogerd AT uic DOT edu
====You will take a long journey. Remember to export your variables.====




On Mon, 29 Sep 2008, Richard Sims wrote:

>If you're unsure that a posting got distributed, inspect the List
>archives to see if your posting made it into circulation.
>
>First and foremost, if your TSM server is being jeopardized by a
>client's behavior, you need to protect the server.  You can do 'SHow
>LOGPINned Cancel' to terminate sessions or processes which are pinning
>the Recovery Log, as described in the TSM Problem Determination Guide
>- or simply cancel the session outright.
>
>Beyond that, someone needs to take a good look at what that client is
>doing, relative to healthy transaction processing.  Someone could have
>set up something unreasonable, perhaps in ignorance of best practices;
>or there could be an odd condition causing the client to get stuck
>somewhere in the file system.  Check back in your Activity Log for ANE
>session conclusion messages for that client, to see if it's performing
>B/A client work (rather than TDP) and check past session statistics
>for a sense of sizes, rates, and duration.  This can reveal if what
>the client has been doing has been getting more outrageous over time,
>or whether the current session is anomalous.  If the networking seems
>ploddingly slow from the stats, that can get fixed.  It needs
>analysis.  Talk to the client admin and see if they made recent
>changes, or are aware of unusual data activity.
>
>    Richard Sims   http://people.bu.edu/rbs/
>