ADSM-L

Re: Recovery log utilization does not drop after DB backup

2001-05-31 12:18:15
Subject: Re: Recovery log utilization does not drop after DB backup
From: Paul Zarnowski <vkm AT CORNELLC.CIT.CORNELL DOT EDU>
Date: Thu, 31 May 2001 12:19:33 -0400
We run into this problem a lot.  I believe there are a couple of issues
here.  One issue, that Gerhard mentioned, is that the log utilization does
not drop quickly when the db backup apparently finishes.  The other issue
is that a thread can have the log "pinned", preventing the log utilization
from dropping.  The thread can be a session or a process.  A session
backing up a single large object, or a smaller one over a slower speed
network, can cause this problem.  Also, a process (or session) doing tape
I/O which has gone into error recovery (which can take hours) can also
cause this problem.  I'm sure there are other situations which can cause a
pinned log as well.  These two problems can lead to several operational
problems.  In addition to the log filling up, if you have triggered db
backups, they can keep triggering in a loop, quickly using up all the tapes
allocated for db backups (if you are using tapes).

Some options were added to the tsm server to detect slow/hung sessions.  I
have been experimenting with these, but even at their lowest settings, I
believe they are cancelling sessions that should not be cancelled (single
large archive files going to tape).  I think the problem has to do with not
properly excluding media wait time from it's calculation of throughput.

I think this problem gets worse as your environment gets more varied.  We
have a wide variety of nodes, including hundreds of workstations along with
a few very large database servers.

As Angela states, you can try doing a full DB backup daily.  But in our
environment, this is not very practical because of the amount of time it
takes to run a full backup.  Tivoli's recommendation is to keep your
database size small, but that can really add to your licensing, hardware,
and management expenses to keep propagating more and more servers.  I think
this is really a limitation of the database technology that TSM uses, and I
think Tivoli needs to address this.  I believe there is some relief coming
for the max log size sometime soon, but I don't think that will be enough
relief for some of us.  If you haven't already maxed out your recovery log
size, you can try increasing its size - this will provide some
relief.  But, unfortunately, I don't believe there is anything you can do
to guarantee that you won't run into this problem again.

..Paul

At 06:31 AM 5/30/2001 -0700, Angela Hughes wrote:
This problem can also be eliminated by performing a
full DB backup daily which I've always done and if the
environment is very large with a lot of activity you
can perform incremental DB backups with the DB trigger
option throughout the day.
Thanks,
Angela

--- David Longo <David.Longo AT HEALTH-FIRST DOT ORG> wrote:
> When your recovery log hits 100%, basically one of
> two things happens.
>
> 1.  If there is "Available Space" greater than
> "Assigned Capacity" then log will acquire some of
> this additional space.
>
> 2.  When all space is consumed - TSM server crashes!
>  I have had this happen with 3.7.4.0 server on AIX.
> Solution for that is to use OS level dsmfmt and
> dsmserv extend log to gain additional space so
> server can be restarted.
>
>
> David B. Longo
> System Administrator
> Health First, Inc.
> 3300 Fiske Blvd.
> Rockledge, FL 32955-4305
> PH      321.434.5536
> Pager  321.634.8230
> Fax:    321.434.5525
> david.longo AT health-first DOT org
>
>
> >>> suad AT CCU1.AUCKLAND.AC DOT NZ 05/30/01 02:28AM >>>
> > some time ago I complained in a mail to this list
> that the recovery log
> > utilization is not reset after a database backup.
> APAR IC30181 was generated
> > for this problem. Its status is "open".
>
> I complained about the problem too. The response I
> got was, when the system was
> too busy it took a while for it to happen.
>
> I have done this when no sessions/processes were
> running on the system and it still
> took over an hour to reset itself.
>
> Our situation is worse as sometimes the log jumps up
> to over 90% before we start to
> backup (triggered by a ANR0314W). It has continued
> to increase after the backup
> and has got to 98% in one observed instance. Will it
> stop sessions if it gets to 100?
>
> The reason we don't have a auto triggered backup is
> that we use a manual tape
> drive (LTO is a bit of a waste for incrementals).
>
> Suad
> --
>
>
>
> "MMS <health-first.org>" made the following
>  annotations on 05/30/01 09:17:19
>
------------------------------------------------------------------------------
> This message is for the named person's use only.  It
> may contain confidential, proprietary, or legally
> privileged information.  No confidentiality or
> privilege is waived or lost by any mistransmission.
> If you receive this message in error, please
> immediately delete it and all copies of it from your
> system, destroy any hard copies of it, and notify
> the sender.  You must not, directly or indirectly,
> use, disclose, distribute, print, or copy any part
> of this message if you are not the intended
> recipient.  Health First reserves the right to
> monitor all e-mail communications through its
> networks.  Any views or opinions expressed in this
> message are solely those of the individual sender,
> except (1) where the message states such views or
> opinions are on behalf of a particular entity;  and
> (2) the sender is authorized by the entity to give
> such views or opinions.
>
>
==============================================================================


__________________________________________________
Do You Yahoo!?
Get personalized email addresses from Yahoo! Mail - only $35
a year!  http://personal.mail.yahoo.com/