ADSM-L

Re: OS390 TSM Performance questions.

2003-02-13 13:07:58
Subject: Re: OS390 TSM Performance questions.
From: Alan Davenport <Alan.Davenport AT SELECTIVE DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Thu, 13 Feb 2003 13:05:05 -0500
Hello All, many thanks to all who responded to my inquiry. It looks like
Mark's response, item #2, was the answer. I had my region size at 1280M and
TSM was running just awful. I had a phone conversation with Mark and
afterwards, I tried his suggestion of REDUCING the region size. Note the
before/after output to the "show memu SHORT" (Case sensitive!) display:

Region Size = 1280M

MAX initial storage  1342177280 (1280.0 MB)

Freeheld bytes  145620  (0.1 MB)
MaxQuickFree bytes 26387005  (25.2MB)

56 Page buffers of 32210 : 315 buffers of 4026.

4 Large buffers of 2013 : 222 XLarge buffers of 251.

202 buffers free: 336 hiAlloc buffers: 134 current buffers.

50 units of 688 bytes hiAlloc: 44 units of 72 bytes hiCur.

Region Size=512M

MAX initial storage  536870912  (512.0 MB)
Freeheld bytes 10280787  (9.8 MB)
MaxQuickFree bytes 10280878  (9.8 MB)
56 Page buffers of 12549 : 4 buffers of 1568.
2 Large buffers of 784 : 18 XLarge buffers of 98.
66992 buffers free: 81083 hiAlloc buffers: 1903 current buffers.
28969 units of 56 bytes hiAlloc: 1532 units of 104 bytes hiCur.

Look at the second line of the displays. It appears that with region=1280M
the "Freeheld bytes" buffer was WAY under allocated. Only 145K was
allocated. With the region size set to 512M 9.8MB was allocated to the
buffer and TSM is running significantly better. Whether or not this will
help someone else I do not know. This is the first I've heard that REDUCING
region size will help performance. It is counter-intuitive. I had been
increasing it slowly over a period of time based on information I had found
on ADSM.ORG. It's hard to argue with results however. My maintenance cycle
is currently around 3 hours further along today than it usually is.

     Take care,
         Al

Alan Davenport
Senior Storage Administrator
Selective Insurance Co. of America
alan.davenport AT selective DOT com
(973) 948-1306


-----Original Message-----
From: Darby, Mark [mailto:Mark.Darby AT HQ.DOE DOT GOV]
Sent: Wednesday, February 12, 2003 12:03 PM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: OS390 TSM Performance questions.


Hello, Al.

We have much to share.  We are OS/390 2.10 on a 7060-H50 (~120 MIPS) with
approx. 100Mbit network connectivity and have had many, long-standing TSM
performance problems.  We are currently running 4.2.3.2.  We have discovered
in working with TSM support (without a technical explanation as to "why?")
that reducing the TSM server's region to 512M and setting (by reducing)
bufpoolsize to 131072 (i.e., 128MB) works for us.  We had previously tried
several region settings from 1.75G down to 960M with the same problematic
results until "happening upon" the severely reduced, storage-constrained
"settings" with which we are now running (or should I say, limping).  This
was determined with the help of the Tivoli "performance team" in response to
a long string of numerous performance-related PMRs.

Here are some things we have discovered - and which work best for us:
1. Region over 512M causes serious and pervasive performance problems
2. BufPoolSize much over 131072 MAY also cause/contribute similarly (and
definitely doesn't help)
3. CPU utilization is VERY high for any database-intensive processes
4. Database corruption may be the root cause for our severe symptoms (this
is purely conjecture on my part at this point, but supported, to some
degree, by TSM support statements recommending we fix known DB corruption -
which, of course, with dump/reload/audit performance being what it is, is an
impossible "hit" to take).  FYI: We plan to "move out" of the TSM server
with database corruption "into" a new, virgin server(s) as soon as time and
other factors permit.

Prior to adjusting our "settings" as indicated above, we were experiencing
severe, pervasive, and nearly continual performance problems (and CPU
over-utilization), server unresponsiveness, and what I would call
"stress-related" failures of all sorts, and a whole plethora of other,
unmentioned "problems".  After making "the adjustments" we have found that,
although the TSM server still frequently gets "tangled up in its shorts",
the problems are not as severe nor are they as frequent or pervasive, and
performance is better than when we ran it in the "larger memory footprint".
Although it is closer to acceptable, it is still well below the kind of
performance I expect from an application running on the platform (i.e.,
S/390).

We cannot even imagine a reason why these adjustments have helped, but they
have.  It is totally counter-intuitive to me that reducing the memory
footprint would yield these results, but it has.

I would call IBM/Tivoli support, if I were you, and start a diagnostic
regimen with them on your particular issues.  We were told by them that many
OS/390 shops are getting far superior performance, throughput, and (I
presume) a much better CPU utilization picture than we experience.  Further,
their stated position is that some environmental factor, unique to "us", is
the root cause for our performance issues.  Aside from our limited bandwidth
and database corruption "issues", I cannot think of any other factor that
makes us extremely unique among all the other users of the TSM server on
OS/390.

You are the first shop I have heard reporting an experience similar to ours.

Please feel free to explore this further with me off-line if you wish.

Regards,
Mark Darby
(301) 903-5229

-----Original Message-----
From: Alan Davenport [mailto:Alan.Davenport AT SELECTIVE DOT COM]
Sent: Wednesday, February 12, 2003 10:46 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: OS390 TSM Performance questions.

Hello,

        We're running TSM v5.1.5.4 on an IBM 20660A2 processor running OS390
v10. There is a 100Mbit, single port OSA card on the processor. We are
backing up 197 clients per night. MAXSCHEDSESSIONS is set to allow 116
simultaneous backup sessions. Our backup window begins at 20:00 and ends at
07:30 the next morning. We are seeing poor performance on our backups during
the window.  For example, one server that will backup in 6-7 minutes outside
the window takes hours to complete during the window. The TSM server has a
region size of 1280M and MPTHREADING is set to YES. Self tune buffer size
and TXN size is enabled. We are backing up to a 100GB disc buffer to an EMC
model 8830 drive array. On average we backup 30-40GB per night with a peak
of 75-80GB.

        I know there are much larger shops backing up many more servers out
there running OS390 also. What I would like to know is, on large shops, what
is your OSA configuration? Are you running multi-port OSAs and/or gigabit
cards? For comparison, I would also like to know how many clients you are
backing up per night. Where do you think the bottleneck is? Have you seen
similar problems and what did you do to help alleviate the problem? I am
fairly confident that TSM is not CPU constrained during the window. We
recently moved TSM to a higher service class with little effect on the
problem.  Do you feel we are saturating the OSA card?

        Any thoughts and suggestions would be greatly appreciated.

          Take care,
               Al

Alan Davenport
Senior Storage Administrator
Selective Insurance Co. of America
alan.davenport AT selective DOT com
(973) 948-1306