ADSM-L

Re: [ADSM-L] TSM Recovery log is pinning since upgrade to 5.5.5.0 code

2011-04-17 10:19:27
Subject: Re: [ADSM-L] TSM Recovery log is pinning since upgrade to 5.5.5.0 code
From: Bob Booth <booth AT ILLINOIS DOT EDU>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Sun, 17 Apr 2011 09:16:41 -0500
On Sat, Apr 16, 2011 at 11:41:13PM -0400, Nancy L Leugemors wrote:
> Hello,
>
> We just upgraded our TSM Server this  from 5.5.4.0 to 5.5.5.0  to fix APAR
>  IC66116.   Since the server upgrade we have experienced  several recovery
> log pinning incidents from a few different database backup clients.   I
> haven't found any hits on new bugs related to our issue and I just opened
> up a case with support but was wondering if anyone had a similar issue
> with this level of server and client code or know of any APARS this might
> fit?   Our recovery log has always been set to 13GB so I can't expand it
> anymore.     Running a database backup doesn't free any space while the
> client database backup is running either.   We run 1 full TSM database
> backup and multiple incrementals throughout the day.   The only difference
> is the upgrade to the latest code.   I'm not seeing any other errors, just
> showing different database clients pinning the log after issuing show log
> pinned command.   Sometimes the client backup finishes before hitting 80%
> and sometimes TSM server cancels the longest running backup that is
> pinning the log.

We are still at 5.5.4, and have seen more instances of log pins and database
lock conflicts, however, look to see if you have any of these factors:

Increased the size of your database, possibly to sub-optimal disks/controllers,
or RAID devices.

Longer running expirations, and or additions of systems that may have an
increased number of files (say in the millions).

Change in client levels on some nodes, due to compatibility problems (or some
other issue).  Mac's and TDP's come to mind. --

Our database is in an almost constant state of backup, as we use rollforward
recovery.  I would be most interested in anything that IBM support says, so
pass on the good word if you hear anything.  We are not quite ready for V6
yet, since we have to stand up new infrastructure, and get prepared for that
whole other set of problems.

The suggestion that you back your log off to 12GB is a good one, since you will 
very screwed if you fill up the log and fall over.

My questions for the list are this,

Would it help to get out of rollforward recovery mode and just to periodic
database backups at some standard hour?  I know the reason for rollforward, but
I'm willing to live with the losses if it fixes this problem.

Is there any way to stop the server from killing the oldest transaction?  It
never seems to be the one that is pinning the log, and all it does is force
the long running jobs to start over, usually making the issue worse.  We do
a lot of image backups which take several hours, and it would be nice to just
allow them to finish and get it over with.

Is there a fix for the error message about the log transaction delay?  I saw
an APAR that showed the 3ms text is bogus, and is actually 1s.   This should
have been fixed before my current level.

TSM: 5.5.4.0
AIX 5.3 - 11
TSM clients 5.5.X - 6.X
DB size 243GB
Log size 12GB

Good luck.

> example of error:
>                           29857)
> 04/16/11   04:03:26      ANR0524W Transaction failed for session 24418 for
> node
>                           DEVNODE_API (SQL-BACKTRACK) -  data transfer
> interrupted.
>                           (SESSION: 24418)
> 04/16/11   04:03:26      ANR2997W The server log is 81 percent full. The
> server
>                           will delay transactions by 3 milliseconds.
> (SESSION:
>                           29846)
>
>
> TSM Background:
>
> TSM Server:  5.5.5.0
> TSM OS:  AIX 5300 -12
> TSM Clients:  5.5.2.0
> TSM Client OS:  AIX 5.3 (UDB-DB2 & SQL-Backtrack Sybase seen so far)