ADSM-L

Re: [ADSM-L] TSM Recovery log is pinning since upgrade to 5.5.5.0 code

2011-04-17 19:13:42
Subject: Re: [ADSM-L] TSM Recovery log is pinning since upgrade to 5.5.5.0 code
From: Steve Harris <steve AT STEVENHARRIS DOT INFO>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Sun, 17 Apr 2011 19:08:25 -0400
For anyone looking to go to TSM server 5.5.5.0, be aware of a nasty bug
that requires SQL identifiers to be 18 characters or less in length and
may break your scripts.  IC71586 was fixed at 5.5.5.1

Regards

Steve

Steven Harris

TSM Admin,
Canberra Australia


On Sun, 17 Apr 2011 10:18:25 -0700, Thomas J <tjacobjr AT GMAIL DOT COM>
wrote:
We have seen the same log pinning issue once we upgraded from TSM
5.4.3 to
5.5.4.x. We have had a couple of TSM crashed due to log fill. The
workaround
was to:
-set logmode normal
-scripts to monitor logs>80% every 15 mins. and kill process like
NDMP ba
stgp, to get log usage down
-get woken up in the middle of the night, thanks to BMC Patrol, if
scripts
still don't do the job.

This is an irritating bug. Worked with TSM support and was told
5.5.5.0 code
would resolve this issue. We have just upgrade two of environments to
5.5.5.0 last week. I am not happy to see that 5.5.5.0 still has the
log
pinning issue. I guess IBM is showing us the road to V6.

Thomas


-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf
Of Bob
Booth
Sent: Sunday, April 17, 2011 7:17 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: [ADSM-L] TSM Recovery log is pinning since upgrade to
5.5.5.0
code

On Sat, Apr 16, 2011 at 11:41:13PM -0400, Nancy L Leugemors wrote:
Hello,

We just upgraded our TSM Server this  from 5.5.4.0 to 5.5.5.0  to
fix APAR
 IC66116.   Since the server upgrade we have experienced  several
recovery
log pinning incidents from a few different database backup clients.
I
haven't found any hits on new bugs related to our issue and I just
opened
up a case with support but was wondering if anyone had a similar
issue
with this level of server and client code or know of any APARS this
might
fit?   Our recovery log has always been set to 13GB so I can't
expand it
anymore.     Running a database backup doesn't free any space while
the
client database backup is running either.   We run 1 full TSM
database
backup and multiple incrementals throughout the day.   The only
difference
is the upgrade to the latest code.   I'm not seeing any other
errors, just
showing different database clients pinning the log after issuing
show log
pinned command.   Sometimes the client backup finishes before
hitting 80%
and sometimes TSM server cancels the longest running backup that is
pinning the log.

We are still at 5.5.4, and have seen more instances of log pins and
database
lock conflicts, however, look to see if you have any of these
factors:

Increased the size of your database, possibly to sub-optimal
disks/controllers,
or RAID devices.

Longer running expirations, and or additions of systems that may have
an
increased number of files (say in the millions).

Change in client levels on some nodes, due to compatibility problems
(or
some
other issue).  Mac's and TDP's come to mind. --

Our database is in an almost constant state of backup, as we use
rollforward
recovery.  I would be most interested in anything that IBM support
says, so
pass on the good word if you hear anything.  We are not quite ready
for V6
yet, since we have to stand up new infrastructure, and get prepared
for that
whole other set of problems.

The suggestion that you back your log off to 12GB is a good one,
since you
will very screwed if you fill up the log and fall over.

My questions for the list are this,

Would it help to get out of rollforward recovery mode and just to
periodic
database backups at some standard hour?  I know the reason for
rollforward,
but
I'm willing to live with the losses if it fixes this problem.

Is there any way to stop the server from killing the oldest
transaction?  It
never seems to be the one that is pinning the log, and all it does is
force
the long running jobs to start over, usually making the issue worse.
We do
a lot of image backups which take several hours, and it would be nice
to
just
allow them to finish and get it over with.

Is there a fix for the error message about the log transaction delay?
I saw
an APAR that showed the 3ms text is bogus, and is actually 1s.   This
should
have been fixed before my current level.

TSM: 5.5.4.0
AIX 5.3 - 11
TSM clients 5.5.X - 6.X
DB size 243GB
Log size 12GB

Good luck.

example of error:
                          29857)
04/16/11   04:03:26      ANR0524W Transaction failed for session
24418 for
node
                          DEVNODE_API (SQL-BACKTRACK) -  data
transfer
interrupted.
                          (SESSION: 24418)
04/16/11   04:03:26      ANR2997W The server log is 81 percent full.
The
server
                          will delay transactions by 3 milliseconds.
(SESSION:
                          29846)


TSM Background:

TSM Server:  5.5.5.0
TSM OS:  AIX 5300 -12
TSM Clients:  5.5.2.0
TSM Client OS:  AIX 5.3 (UDB-DB2 & SQL-Backtrack Sybase seen so far)