Re: Recovery Log almost 100%

I wonder, also, if there is still any discussion about supporting
the use of an alternate RDBMS underneat TSM. It is quite clear
that there are many more sites with database sizes in the
25-50GB+ range. Five years ago I felt very lonely with a database
of this size, but given the discussions on the listserv over the
past year I feel more comfortable that we are no longer one of
the only sites supporting TSM instances that large. It has always
seemed to me that the database functions of TSM have been the
most problematic (deadlock issues, log full issues, SQL query
performance problems, complicated and unclear recommendations for
physical database layout, etc.). All of these problems have been
solved by Oracle, DB2, and Sybase. Granted there is the issue
that plugging in an external database adds greatly to the
complexity of TSM, and reduces it's "black box-ness", but I think
the resources are available to administer such a beast at
the large sites that require very large databases.

More food for thought *early* on a Thursday morning.

 -- Tom

Thomas A. La Porte
DreamWorks SKG
tlaporte AT dreamworks DOT com

On Thu, 2 May 2002, Paul Zarnowski wrote:

>TSM Development is fully aware of the log issue and based on some
>conversations at SHARE, I am comfortable that they are taking steps to
>address it (with or without a requirement).  I don't think this issue will
>be completely solved quickly, as it is a rather complex set of
>problems.  In the short term, look for tools to show up that will help TSM
>administrators to identify which session has the log tail pinned, and also
>address one of the issues that Paul refers to below, which causes the log
>head to advance quickly (and shows up as a high dirty page count).
>
>When the log fills, two things happen:  The log tail must be pinned by a
>long-running in-flight transaction, and the log head must advance around to
>catch up to the tail.  To keep the log from filling, you can either release
>the tail or slow down the head.  It is not easy to identify the session or
>thread that has the log tail pinned.  I don't know if the tools I refer to
>above have shown up in 4.2.2 or 5.1 (we're still running 4.2.1).  There are
>a couple of things that can advance the head quickly.  Inventory expiration
>and filespace deletion.  If you find yourself in a situation where you see
>the log filling quickly and don't know what has the tail pinned, check for
>these two processes and kill them if you see them.  This will significantly
>slow down the growth rate of the log, and give the oldest in-flight
>transaction more of a chance to complete.  We have written a monitor to do
>this automatically, and it has really helped us.  If neither of these
>processes are running, then you can start guessing about which session
>might have the tail pinned.  In this situation, we look for an old session
>that has been running for a long time.  This might be a session backing up
>over a slow speed line.  If the log nears 100%, we try to avoid it filling
>completely by cancelling all sessions (if we have time) or simply HALTing
>the server and restarting it.  This generally clears the log when the
>server comes back up, and avoids having to do an offline extend of the log
>(which has already been discussed).  If you are running
>logmode=rollforward, be aware that when you later reduce the log size to
>delete the temporary extension, you will (I think) trigger a full database
>backup.
>
>If you are at v4.2, you can have a larger log, up to 13GB.  This can also
>provide some relief.
>
>..Paul
>
>At 12:13 AM 5/2/2002 -0400, Seay, Paul wrote:
>>Actually, this was significantly discussed at Share and the basic
>>requirement is TSM, take action whatever necessary to keep the server up.
>>Start by cancelling expiration.  Then nail the client that has the log
>>pinned.  There were also a number of issues discussed.  Apparently, there
>>are a lot of dirty blocks being recorded in the log that do not have to be.
>>I am working to get these requirements voted on.
>>
>>Paul D. Seay, Jr.
>>Technical Specialist
>>Naptheon, INC
>>757-688-8180
>>
>>
>>-----Original Message-----
>>From: Thomas A. La Porte [mailto:tlaporte AT ANIM.DREAMWORKS DOT COM]
>>Sent: Wednesday, May 01, 2002 4:36 PM
>>To: ADSM-L AT VM.MARIST DOT EDU
>>Subject: Re: Recovery Log almost 100%
>>
>>
>>Given that this is one of the more comman FAQ style questions on this
>>listserv, I wonder if it's not time for someone to submit a TSM requirement
>>that the server behave better in a recovery log full situation. This happens
>>in other databases w/o causing a SIGSEGV. Oracle, for example, simply
>>prevents any database changes, and only allows new administrative
>>connections to the database until the log full situation is cleared (by
>>archiving the online redo logs). It seems that TSM could behave similarly.
>>
>>Certainly the server is not in a great state when the log segments are full,
>>but it would seem easier to recover, and somewhat less confusing to
>>administrators, if it could be done online, rather than in the manner in
>>which it is handled now. We've all probably experienced a situation where we
>>are close to the limit on the log size, so we only extend the log a little
>>bit, and then there is a rush to see if our database backup is going to
>>finish and clear the log full condition before we use up the additional log
>>space--lest we find ourselves in the same perilous condition, only *closer*
>>to the seemingly arbitrary maximum log size.
>>
>>  -- Tom
>>
>>Thomas A. La Porte
>>DreamWorks SKG
>>tlaporte AT dreamworks DOT com
>>
>>On Wed, 1 May 2002, Sung Y Lee wrote:
>>
>> >When log reaches 100%, just pray that TSM server process will not
>> >crash.
>> >
>> >
>> >I say the key is prevention.  Whatever you can do to prevent that from
>> >happening is the best answer.
>> >
>> >There are many things you can do to prevent from growing to 100%. One
>> >that works for me is I have LogMode set to Roll Forward mode with dbb
>> >trigger at 38% with incremental between at 3(q dbb) Log is also set to
>> >maximum allowed without going over limit plus room for extension should
>> >it ever reaches 100% and TSM crashes.  Have it set at 4.5 GB(To be
>> >safe).  Max allowed recovery log for TSM 4.1 is 5.3 GB?? I can't recall
>> >exact value.
>> >
>> >
>> >If the TSM server is in Log mode than more than likely it will have dbb
>> >trigger set at certain point.   For example,
>> >adsm> q dbb
>> >
>> >Full           Incremental       Log Full         Incrementals
>> >Device         Device          Percentage  Between
>> >Class          Class                    Fulls
>> >----------     -----------     ----------     ------------
>> >IBM3590        IBM3590                 38                 3
>> >
>> >When the recovery log reaches 38%, an incremental database backup will kick
>> >off up to 3 threes before a full database backup performed.   The most of
>> >time TSM server will prevent other sessions from establishing when the
>> >recovery log reaches 100% but will allow the trigger database backup to
>> >complete and it will bring the recovery log down to 0.  Sometimes TSM
>> >will simply crash.  If it crashes then you will need to do an emergency
>> >recovery log extend and bring TSM backup.
>> >
>> >Sung Y. Lee
>> >E-mail sunglee AT us.ibm DOT com
>> >
>> >
>> >
>> >                      brian welsh
>> >                      <brianwelsh3@HOTM        To:
>>ADSM-L AT VM.MARIST DOT EDU
>> >                      AIL.COM>                 cc:
>> >                      Sent by: "ADSM:          Subject:  Recovery Log
>>almost 100%
>> >                      Dist Stor
>> >                      Manager"
>> >                      <[email protected]
>> >                      .EDU>
>> >
>> >
>> >                      05/01/2002 01:23
>> >                      PM
>> >                      Please respond to
>> >                      "ADSM: Dist Stor
>> >                      Manager"
>> >
>> >
>> >
>> >
>> >
>> >Hello,
>> >
>> >AIX 4.3.3 and server 4.1.1.0.
>> >
>> >Last night two archive-schedules had a problem. On the clients there
>> >were files in a kind of loop and TSM tried to archive them. Result,
>> >recovery log almost 100%. This was the first time our log is that high.
>> >Problem on the client solved, but now I have the following question.
>> >
>> >I was wondering how other people prevent the log from growing to 100%,
>> >and how to handle after the log have reached 100%.
>> >
>> >Any tip is welcome.
>> >
>> >Brian.
>> >
>> >
>> >_________________________________________________________________
>> >MSN Foto's is de makkelijkste manier om je foto's te delen met anderen
>> >en af te drukken: http://photos.msn.nl/Support/WorldWide.aspx
>> >
>> >
>
>