ADSM-L

Re: Recovery Log almost 100%

2002-05-02 08:40:25
Subject: Re: Recovery Log almost 100%
From: Paul Zarnowski <vkm AT CORNELLC.CIT.CORNELL DOT EDU>
Date: Thu, 2 May 2002 08:40:17 -0400
TSM Development is fully aware of the log issue and based on some
conversations at SHARE, I am comfortable that they are taking steps to
address it (with or without a requirement).  I don't think this issue will
be completely solved quickly, as it is a rather complex set of
problems.  In the short term, look for tools to show up that will help TSM
administrators to identify which session has the log tail pinned, and also
address one of the issues that Paul refers to below, which causes the log
head to advance quickly (and shows up as a high dirty page count).

When the log fills, two things happen:  The log tail must be pinned by a
long-running in-flight transaction, and the log head must advance around to
catch up to the tail.  To keep the log from filling, you can either release
the tail or slow down the head.  It is not easy to identify the session or
thread that has the log tail pinned.  I don't know if the tools I refer to
above have shown up in 4.2.2 or 5.1 (we're still running 4.2.1).  There are
a couple of things that can advance the head quickly.  Inventory expiration
and filespace deletion.  If you find yourself in a situation where you see
the log filling quickly and don't know what has the tail pinned, check for
these two processes and kill them if you see them.  This will significantly
slow down the growth rate of the log, and give the oldest in-flight
transaction more of a chance to complete.  We have written a monitor to do
this automatically, and it has really helped us.  If neither of these
processes are running, then you can start guessing about which session
might have the tail pinned.  In this situation, we look for an old session
that has been running for a long time.  This might be a session backing up
over a slow speed line.  If the log nears 100%, we try to avoid it filling
completely by cancelling all sessions (if we have time) or simply HALTing
the server and restarting it.  This generally clears the log when the
server comes back up, and avoids having to do an offline extend of the log
(which has already been discussed).  If you are running
logmode=rollforward, be aware that when you later reduce the log size to
delete the temporary extension, you will (I think) trigger a full database
backup.

If you are at v4.2, you can have a larger log, up to 13GB.  This can also
provide some relief.

..Paul

At 12:13 AM 5/2/2002 -0400, Seay, Paul wrote:
Actually, this was significantly discussed at Share and the basic
requirement is TSM, take action whatever necessary to keep the server up.
Start by cancelling expiration.  Then nail the client that has the log
pinned.  There were also a number of issues discussed.  Apparently, there
are a lot of dirty blocks being recorded in the log that do not have to be.
I am working to get these requirements voted on.

Paul D. Seay, Jr.
Technical Specialist
Naptheon, INC
757-688-8180


-----Original Message-----
From: Thomas A. La Porte [mailto:tlaporte AT ANIM.DREAMWORKS DOT COM]
Sent: Wednesday, May 01, 2002 4:36 PM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: Recovery Log almost 100%


Given that this is one of the more comman FAQ style questions on this
listserv, I wonder if it's not time for someone to submit a TSM requirement
that the server behave better in a recovery log full situation. This happens
in other databases w/o causing a SIGSEGV. Oracle, for example, simply
prevents any database changes, and only allows new administrative
connections to the database until the log full situation is cleared (by
archiving the online redo logs). It seems that TSM could behave similarly.

Certainly the server is not in a great state when the log segments are full,
but it would seem easier to recover, and somewhat less confusing to
administrators, if it could be done online, rather than in the manner in
which it is handled now. We've all probably experienced a situation where we
are close to the limit on the log size, so we only extend the log a little
bit, and then there is a rush to see if our database backup is going to
finish and clear the log full condition before we use up the additional log
space--lest we find ourselves in the same perilous condition, only *closer*
to the seemingly arbitrary maximum log size.

 -- Tom

Thomas A. La Porte
DreamWorks SKG
tlaporte AT dreamworks DOT com

On Wed, 1 May 2002, Sung Y Lee wrote:

>When log reaches 100%, just pray that TSM server process will not
>crash.
>
>
>I say the key is prevention.  Whatever you can do to prevent that from
>happening is the best answer.
>
>There are many things you can do to prevent from growing to 100%. One
>that works for me is I have LogMode set to Roll Forward mode with dbb
>trigger at 38% with incremental between at 3(q dbb) Log is also set to
>maximum allowed without going over limit plus room for extension should
>it ever reaches 100% and TSM crashes.  Have it set at 4.5 GB(To be
>safe).  Max allowed recovery log for TSM 4.1 is 5.3 GB?? I can't recall
>exact value.
>
>
>If the TSM server is in Log mode than more than likely it will have dbb
>trigger set at certain point.   For example,
>adsm> q dbb
>
>Full           Incremental       Log Full         Incrementals
>Device         Device          Percentage  Between
>Class          Class                    Fulls
>----------     -----------     ----------     ------------
>IBM3590        IBM3590                 38                 3
>
>When the recovery log reaches 38%, an incremental database backup will kick
>off up to 3 threes before a full database backup performed.   The most of
>time TSM server will prevent other sessions from establishing when the
>recovery log reaches 100% but will allow the trigger database backup to
>complete and it will bring the recovery log down to 0.  Sometimes TSM
>will simply crash.  If it crashes then you will need to do an emergency
>recovery log extend and bring TSM backup.
>
>Sung Y. Lee
>E-mail sunglee AT us.ibm DOT com
>
>
>
>                      brian welsh
>                      <brianwelsh3@HOTM        To:
ADSM-L AT VM.MARIST DOT EDU
>                      AIL.COM>                 cc:
>                      Sent by: "ADSM:          Subject:  Recovery Log
almost 100%
>                      Dist Stor
>                      Manager"
>                      <[email protected]
>                      .EDU>
>
>
>                      05/01/2002 01:23
>                      PM
>                      Please respond to
>                      "ADSM: Dist Stor
>                      Manager"
>
>
>
>
>
>Hello,
>
>AIX 4.3.3 and server 4.1.1.0.
>
>Last night two archive-schedules had a problem. On the clients there
>were files in a kind of loop and TSM tried to archive them. Result,
>recovery log almost 100%. This was the first time our log is that high.
>Problem on the client solved, but now I have the following question.
>
>I was wondering how other people prevent the log from growing to 100%,
>and how to handle after the log have reached 100%.
>
>Any tip is welcome.
>
>Brian.
>
>
>_________________________________________________________________
>MSN Foto's is de makkelijkste manier om je foto's te delen met anderen
>en af te drukken: http://photos.msn.nl/Support/WorldWide.aspx
>
>
<Prev in Thread] Current Thread [Next in Thread>