Re: OS390 TSM Performance questions.

I am resending this to the list, because I only copied it to Bill earlier, and
the thread is still going so there may be some interest in it


---------------------- Forwarded by John Naylor/HAV/SSE on 02/14/2003 03:54 PM
---------------------------


John Naylor
02/14/2003 09:43 AM

To:   Bill Kelly <kellywh AT mail.auburn DOT edu>
cc:
Subject:  Re: OS390 TSM Performance questions.  (Document link: John Naylor)

Hi Bill,
We are running TSM 4.2.2 on os390 2.10 pn a 9672 (5 cpus)
I have region at 512 mb. and bufferpool at 48 mb.
I am running alongside other business applications, and it is only when they are
busy and the
lpar becomes cpu constrained especially when a DB2 application is very busy that
TSM suffers
significantly performance wise.
This morning I ran migration from disk to tape alongside expiration and and I
got 450 mb. per minute,
I do occasionally bounce the server, maybe every three weeks if my perception is
that the performance is a bit more sluggish than normal.
Do you have access to RMF or similar performance tools, as this should help to
isolate what is causing your performance issue.
I personally would look at your  bufferpool size, maybe reducing it. What do
your stats show,
I ran the show memu see results below, but unless someone who really understands
the figures explains
what they mean?, what are good figures?, what are bad figures?, how are they
impacted by the current TSM activity? , I do not think they are worth very much.
For example my Freeheld bytes shows (0.7 MB)
Good, bad, indifferent.? Who knows?

 MAX initial storage  536870912  (512.0 MB)
 Freeheld bytes  741499  (0.7 MB)
 MaxQuickFree bytes 10391797  (9.9 MB)
 35 Page buffers of 12685 : 79 buffers of 1585.
 6 Large buffers of 792 : 54 XLarge buffers of 99.
1221 buffers free: 5544 hiAlloc buffers: 4323 current buffers.
3290 units of 40 bytes hiAlloc: 3289 units of 40 bytes hiCur.

So in summary my advice would be make sure TSM is getting the cpu it needs
Check your RMF or similar
Ensure you are running MPthreading
Have a look through ADSM.org (search  "region")
regards,
John





Bill Kelly <kellywh AT mail.auburn DOT edu> on 02/13/2003 07:48:37 PM

Please respond to Bill Kelly <kellywh AT mail.auburn DOT edu>

To:   adsm-l AT vm.marist DOT edu
cc:    (bcc: John Naylor/HAV/SSE)
Subject:  Re: OS390 TSM Performance questions.



Hi,

We seem to be experiencing symptoms similar (identical?) to Alan's.

We're at z/OS 1.2, running on a 2066-002 w/ 8GB of memory and virtually no
paging; TSM is at 4.2.3.0.; database is 55% of 106 GB.  Network
connectivity is via GB ethernet. Disk pool is 190GB on an ESS. Nightly
backup load is approximately 230 clients (a mix of desktops and servers),
averaging in the 130-140GB range per night total.

For some weeks now (I'm not sure when this started, but I know the problem
was there at 4.2.2.10), we've been seeing horrible performance after TSM
has been up for a few hours.  For example, I can watch 3 migration
processes that run along fine for a little while, each getting approx. 400
MB/min throughput, then suddenly CPU utilization by TSM shoots up to 95%
and throughput on the migrations drops to approx. 50 MB/min per process.
Stopping and restarting the processes does no good, but cycling the server
clears up the problem.  I'm certain this problem affects other server
activities, such as client backups, storage pool backups, etc.

Like Alan, I've been ratcheting up the region size (up to 1.5 GB) and the
db bufferpool size (up to 384 MB) in a vain attempt to help matters.

I recently resorted to cycling the server 4 times per day just to get the
performance needed to keep up with things.

Based on the comments in this thread, I last night changed our region size
to 512 MB and db bufferpool size to 128 MB.  Until now, I wasn't aware of
the 'show memu' diagnostic command (thanks Alan/Mark! I finally have
*something* to quantify directly); here's the output from our server:

    MAX initial storage  536870912  (512.0 MB)
    Freeheld bytes   63678  (0.1 MB)
    MaxQuickFree bytes 10390159  (9.9 MB)
    83 Page buffers of 12683 : 0 buffers of 1585.
    0 Large buffers of 792 : 1 XLarge buffers of 99.
   68 buffers free: 134 hiAlloc buffers: 66 current buffers.
   12 units of 56 bytes hiAlloc: 11 units of 88 bytes hiCur.

So apparently I still have the 'tiny Freeheld' problem; I suspect strongly
I had the same trouble at 1.5 GB region size. (I don't suppose the
functions of and relationships among these buffer pools is documented
anywhere?  I haven't found anything in the list archives or at the support
web site.)  I wonder if there's a factor other than db bufferpool size
and region size that's affecting these buffer pool allocations?

I suspect that our server performance goes south once we run out of
one/some type(s) of these buffers and the server starts
GETMAINing/FREEMAINing itself to death?

Lacking any further information, I plan to do some bouncing of our server
this weekend to see if I can come up with a region and db bufpool
combination that will get the 'Freeheld bytes' (and presumably the
'buffers free') numbers into a reasonable range.  Perhaps if I can do
that, I'll be able to stop this insane cycling of the server every 5-8
hours.

Thanks for your help and insight!
Bill

Bill Kelly
Auburn University
kellywh AT mail.auburn DOT edu

On Thu, 13 Feb 2003, Alan Davenport wrote:

> I had my region size at 1280M and
> TSM was running just awful. I had a phone conversation with Mark and
> afterwards, I tried his suggestion of REDUCING the region size. Note the
> before/after output to the "show memu SHORT" (Case sensitive!) display:
>
> Region Size = 1280M
>
> MAX initial storage  1342177280 (1280.0 MB)
> Freeheld bytes  145620  (0.1 MB)
> MaxQuickFree bytes 26387005  (25.2MB)
> 56 Page buffers of 32210 : 315 buffers of 4026.
> 4 Large buffers of 2013 : 222 XLarge buffers of 251.
> 202 buffers free: 336 hiAlloc buffers: 134 current buffers.
> 50 units of 688 bytes hiAlloc: 44 units of 72 bytes hiCur.
> Region Size=512M
>
> MAX initial storage  536870912  (512.0 MB)
> Freeheld bytes 10280787  (9.8 MB)
> MaxQuickFree bytes 10280878  (9.8 MB)
> 56 Page buffers of 12549 : 4 buffers of 1568.
> 2 Large buffers of 784 : 18 XLarge buffers of 98.
> 66992 buffers free: 81083 hiAlloc buffers: 1903 current buffers.
> 28969 units of 56 bytes hiAlloc: 1532 units of 104 bytes hiCur.
>
> Look at the second line of the displays. It appears that with region=1280M
> the "Freeheld bytes" buffer was WAY under allocated. Only 145K was
> allocated. With the region size set to 512M 9.8MB was allocated to the
> buffer and TSM is running significantly better. Whether or not this will
> help someone else I do not know. This is the first I've heard that REDUCING
> region size will help performance. It is counter-intuitive. I had been
> increasing it slowly over a period of time based on information I had found
> on ADSM.ORG. It's hard to argue with results however. My maintenance cycle
> is currently around 3 hours further along today than it usually is.
>
>      Take care,
>          Al
>











**********************************************************************
The information in this E-Mail is confidential and may be legally
privileged. It may not represent the views of Scottish and Southern
Energy plc.
It is intended solely for the addressees. Access to this E-Mail by
anyone else is unauthorised. If you are not the intended recipient,
any disclosure, copying, distribution or any action taken or omitted
to be taken in reliance on it, is prohibited and may be unlawful.
Any unauthorised recipient should advise the sender immediately of
the error in transmission.

Scottish Hydro-Electric, Southern Electric, SWALEC and S+S
are trading names of the Scottish and Southern Energy Group.
**********************************************************************