ADSM-L

Re: Dwindling Performance

2004-01-14 10:53:15
Subject: Re: Dwindling Performance
From: Ben Bullock <bbullock AT MICRON DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Wed, 14 Jan 2004 08:52:50 -0700
        Hmmm, interesting that the expire inventory grinds to a halt
during incremental backups. My setup is AIX similar to yours (host, DB
size) although my disks are on locally attached SSA drives. I recently
upgraded my 8 TSM servers from TSM 5.1.1.0 to 5.2.1.3 (mainly to get the
NDMP file-level backups, finally).

         On one of them I saw the same issue. They are all set up almost
identically, so why 1 would misbehave is a mystery to me. To fix the
immediate problem, I put a " duration=" on the expire inventory job so
that it would only run during the day when backups are less likely.
Sure, the expire inventory now takes 2 days to run, but it's better than
having all the backups go extremely slow and not complete.

        I then started to look into the performance issues. Some of the
things I have done:
        - I changed the DB volumes from JFS to raw (that made a very
good improvement). 
        - Turn the SSA fastwrite cache on the db volumes. 
        - Tried out these settings for vmtune (gleaned from this
listsrv)
                /usr/samples/kernel/vmtune  -t10 -P10 -p5 -s1 -W16 -c8
-R256 -F512 -u25 -b2200 -B2200

        All of these changes have improved the speed of the expire
inventory, but to be honest I haven't tried to run the expire inventory
during the incremental backups since. Once bitten twice shy, and I can
live with the expire inventory taking 2 days to complete.
        
        That's kind of where I am now. No solid solution, but improved
performance enough that it's workable now.

        I'd love to hear what other changes you make to resolve your
situation.

Ben
Micron Technology Inc.
Boise, Id 

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
Andy Carlson
Sent: Wednesday, January 14, 2004 7:07 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: Dwindling Performance


Thanks for the quick response.

Expiration is not finishing.  Before the main backups start, it maybe
expires 200000 objects, but during the backup window it slows to a
crawl.  It picks up some during the day when the backups and migrations
are running, but since we now have some 100 sessions not finished, its
slow then too.

I didn't look at randomize, but these sessions are staying out there for
hours.  I will take a look at that today.

I currently have them doing an incrbydate every other day, and a full
incr the othter.

The cache hit ratio of the database is about 98.5%, but we have about
3.5GB of memory in the cache.  I don't think I can go much higher, but I
will try it if I can.

P.S.  The TSMI clients are Windows and Netware, the TSMU are Unix and a
couple of VMS.

Thanks for the input.


Andy Carlson                                    |\      _,,,---,,_
Senior Technical Specialist               ZZZzz /,`.-'`'    -.  ;-;;,_
BJC Health Care                                |,4-  ) )-,_. ,\ (  `'-'
St. Louis, Missouri                           '---''(_/--'  `-'\_)
Cat Pics: http://andyc.dyndns.org/animal.html


On Wed, 14 Jan 2004, Roger Deschner wrote:

> I have posted many times in the past saying you should never do a 
> database unload/reload to gain performance. But this just might be the

> one case where it might make sense - the remaining half of a split 
> server. But before you do something that drastic, dangerous, and time 
> consuming, look for the things that are easier to fix.
>
> My basic metric of whether or not you are in trouble is, how long does

> expiration take? If you start it daily, the closer it is to 24 hours 
> running time, the closer you are to doomsday. Never-ending expiration 
> is the classic symptom of TSM Server Meltdown.
>
> But on the other hand, if your expiration runs nice and fast, your 
> server and its database are probably OK. Look to clients as the 
> problem. They can't all squeeze in the door at once, so don't let them

> try. If they use the client-polling scheduler, how long is the backup 
> window, and what is your setting for Schedule Randomization 
> Percentage? Make it as high as possible - SET RANDOMIZE 50. This will 
> also help if you are having any kind of a network bottleneck.
>
> Look at these clients on a micro level. About how much are they each 
> actually backing up? If it's not much, then your theory might be 
> right, that they are very busy downloading their lists of backed up 
> files. In that case, load spreading will be the best thing you could 
> do. You might consider a schedule where not every client does a full 
> "Incremental" every night - perhaps they only do one every other night

> and on the other nights they do an "incrbydate" backup which is much 
> faster, because it goes only by the timestamps in the file system.
>
> Not to ask the obvious, but what's your Database Cache Hit Percentage?

> (Q DB F=D) If it's below 99%, it needs help. Even (especially) a badly

> fragmented database will run a lot faster if you have it swimming in 
> cache.
>
> Look at other differences between your two instances - are they 
> basically different types of clients?
>
> Roger Deschner      University of Illinois at Chicago
rogerd AT uic DOT edu
> ============The short fortuneteller who escaped from 
> prison============= ======================was a small medium at 
> large.======================
>
>
>
> On Tue, 13 Jan 2004, Andy Carlson wrote:
>
> >We are having terrible performance with one of our instances of TSM.

> >I have suspicions, but I want to hear what you guys say.  Here is 
> >what we
> >have:
> >
> >2 instances of TSM - TSMI and TSMU (TSMI is the problem)
> >
> >TSM 5.2.1.1
> >AIX 51.ML4
> >RS/6000 P670 - 8 processors, 16GB memory
> >Fastt700 SAN
> >STK9840 Tape drives
> >
> >The Database is 85% of 88GB (with room to expand another 50GB or so).
> >
> >Right at this moment, we have 233 sessions with TSMI.  The backup 
> >sessions grind to a halt for hours at a time, with nothing apparently

> >happening.  I suspect that the directory trees are being downloaded 
> >and built, but not sure
> >
> >When we split TSMI and TSMU, we created the TSMU instance, and did a 
> >full backup on all the servers that moved there.  The TSMI database 
> >is a restored copy of the original database, with the TSMU stuff 
> >deleted out.
> >
> >Any ideas would be greatly appreciated.
> >
> >
> >Andy Carlson                                    |\      _,,,---,,_
> >Senior Technical Specialist               ZZZzz /,`.-'`'    -.
;-;;,_
> >BJC Health Care                                |,4-  ) )-,_. ,\ (
`'-'
> >St. Louis, Missouri                           '---''(_/--'  `-'\_)
> >Cat Pics: http://andyc.dyndns.org/animal.html
> >
>

<Prev in Thread] Current Thread [Next in Thread>