ADSM-L

Re: [ADSM-L] Our TSM system is a mess. Suggestions? Ideas?

2010-02-14 11:50:39
Subject: Re: [ADSM-L] Our TSM system is a mess. Suggestions? Ideas?
From: Marcel Anthonijsz <marcel.anthonijsz AT GMAIL DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Sun, 14 Feb 2010 17:49:38 +0100
John,

A few things come to mind; Which nodes are pinning the recovery log? In my
experience it are always a few slow nodes (with a lot of small files
typically) that pin the log. Try to find out which one do, and try to
improve these nodes so that they backup faster. Hell of a job when you have
500 nodes, but try to find those that take longer than 4-5 hours or have a
really slow throughput speed. A speed/duplex mismatch on a TSM client can
killed my log performance more than once.You can look in TSM reporting for
the slowest nodes.

IMHO, I think that TSM 6.1.x will not solve your problem.

Another solution would be to turn of the cell phone off every other day ;-)


Good luck,

2010/2/14 Dury, John C. <JDury AT duqlight DOT com>

> We have about 500 nodes and have a backup windows from 5pm until 7am. I
> have our backup schedule setup so that about 30 nodes do incremental per
> hour with a few exceptions. We have a 3T disk storage pool and 4 LTO4 drives
> in our tape library. Our dbbackuptrigger is set at logfull  30% and
> numincrmeentals of 4.  Our recovery log is filling up almost once per hour
> while backups are running and not emptying fast enough before it hits 80%
> when all backups come to a crawl until it is emptied below 80%. Sometimes
> the recovery log is pinned  at 70% or so and another backup kicks off
> immediately which again does not empty fast enough and the whole system goes
> into slowdown after the recovery log is past 80%. Expiration, which used to
> run in a matter of about 6 hours, is not completing even after running for
> 24 hours. Our DB is about 97gig and about 74% full. The recovery log is
> maxed at 13gig.  I don't see anything in the activity log out of the
> ordinary. The TSM server is AIX 5.3.10.1 TL10 running on an IBM 9131-52A in
> a logical partition with 20 CPus configured and about 32G of RAM. The TSM DB
> and disk storage pools are attached to a Clariion CX3-80 via 4G Hbas. I have
> the recovery log and TSM DB set to use different HBAs then the disk or tape
> storage pools so the HBAs aren't fighting each other. I've read the tuning
> and performance manual and matched our settings to match it's suggestions
> with some small exceptions.
>
> We have purchased new hardware to move the whole system to Linux and a
> monster of a box since we want to get to TSM v6.x eventually, hopefully
> sooner rather than later. AIX hardware and support is tremendously expensive
> when compared to an intel based box and like a lot of people, we have a very
> small budget for anything IT related.
> .
> One of the biggest problems we are having is the recovery log filling up
> too quickly and not emptying fast enough.  Even with a log full trigger of
> 30%, the incremental backup won't finish before the recovery log hits 80%
> and with the log full setting so low, we are doing TSM DB backups almost
> every hour while clients are backing up. This really seems excessive to me.
>  Why would an incremental backup of the TSM DB take an hour or so to run and
> is it normal for the  recovery log to fill up so fast while backups are
> running?
> We even attempted to do a reorg  of the TSM DB but unfortunately it was
> going to run for much longer than our window allowed so it had to be
> cancelled. I'm going to try again for next weekend and hopefully talk the
> powers that be, into a 24 hour window for the reorg. We did do a reorg years
> ago and the performance improvements were amazing, ie expiration ran in less
> than an hour. I know that is a bandaid but I have to do something until I
> can get to version 6 when I can have a bigger recovery log and a new, more
> powerful server in place.
> I guess I'm just not sure what to look at at this point and frankly I'm
> exhausted. Our help desk is calling me daily, every day, at 6am or earlier,
> as "TSM is running slow again".
> Any suggestions on what else to look at? (Sorry for such a fragmented
> email. I've had about 3 hours sleep at this point)
>



--
Kind Regards, Groetje,

Marcel Anthonijsz