ADSM-L

Re: OS390 TSM Performance questions.

2003-02-13 15:03:49
Subject: Re: OS390 TSM Performance questions.
From: Bill Kelly <kellywh AT MAIL.AUBURN DOT EDU>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Thu, 13 Feb 2003 13:48:37 -0600
Hi,

We seem to be experiencing symptoms similar (identical?) to Alan's.

We're at z/OS 1.2, running on a 2066-002 w/ 8GB of memory and virtually no
paging; TSM is at 4.2.3.0.; database is 55% of 106 GB.  Network
connectivity is via GB ethernet. Disk pool is 190GB on an ESS. Nightly
backup load is approximately 230 clients (a mix of desktops and servers),
averaging in the 130-140GB range per night total.

For some weeks now (I'm not sure when this started, but I know the problem
was there at 4.2.2.10), we've been seeing horrible performance after TSM
has been up for a few hours.  For example, I can watch 3 migration
processes that run along fine for a little while, each getting approx. 400
MB/min throughput, then suddenly CPU utilization by TSM shoots up to 95%
and throughput on the migrations drops to approx. 50 MB/min per process.
Stopping and restarting the processes does no good, but cycling the server
clears up the problem.  I'm certain this problem affects other server
activities, such as client backups, storage pool backups, etc.

Like Alan, I've been ratcheting up the region size (up to 1.5 GB) and the
db bufferpool size (up to 384 MB) in a vain attempt to help matters.

I recently resorted to cycling the server 4 times per day just to get the
performance needed to keep up with things.

Based on the comments in this thread, I last night changed our region size
to 512 MB and db bufferpool size to 128 MB.  Until now, I wasn't aware of
the 'show memu' diagnostic command (thanks Alan/Mark! I finally have
*something* to quantify directly); here's the output from our server:

    MAX initial storage  536870912  (512.0 MB)
    Freeheld bytes   63678  (0.1 MB)
    MaxQuickFree bytes 10390159  (9.9 MB)
    83 Page buffers of 12683 : 0 buffers of 1585.
    0 Large buffers of 792 : 1 XLarge buffers of 99.
   68 buffers free: 134 hiAlloc buffers: 66 current buffers.
   12 units of 56 bytes hiAlloc: 11 units of 88 bytes hiCur.

So apparently I still have the 'tiny Freeheld' problem; I suspect strongly
I had the same trouble at 1.5 GB region size. (I don't suppose the
functions of and relationships among these buffer pools is documented
anywhere?  I haven't found anything in the list archives or at the support
web site.)  I wonder if there's a factor other than db bufferpool size
and region size that's affecting these buffer pool allocations?

I suspect that our server performance goes south once we run out of
one/some type(s) of these buffers and the server starts
GETMAINing/FREEMAINing itself to death?

Lacking any further information, I plan to do some bouncing of our server
this weekend to see if I can come up with a region and db bufpool
combination that will get the 'Freeheld bytes' (and presumably the
'buffers free') numbers into a reasonable range.  Perhaps if I can do
that, I'll be able to stop this insane cycling of the server every 5-8
hours.

Thanks for your help and insight!
Bill

Bill Kelly
Auburn University
kellywh AT mail.auburn DOT edu

On Thu, 13 Feb 2003, Alan Davenport wrote:

> I had my region size at 1280M and
> TSM was running just awful. I had a phone conversation with Mark and
> afterwards, I tried his suggestion of REDUCING the region size. Note the
> before/after output to the "show memu SHORT" (Case sensitive!) display:
>
> Region Size = 1280M
>
> MAX initial storage  1342177280 (1280.0 MB)
> Freeheld bytes  145620  (0.1 MB)
> MaxQuickFree bytes 26387005  (25.2MB)
> 56 Page buffers of 32210 : 315 buffers of 4026.
> 4 Large buffers of 2013 : 222 XLarge buffers of 251.
> 202 buffers free: 336 hiAlloc buffers: 134 current buffers.
> 50 units of 688 bytes hiAlloc: 44 units of 72 bytes hiCur.
> Region Size=512M
>
> MAX initial storage  536870912  (512.0 MB)
> Freeheld bytes 10280787  (9.8 MB)
> MaxQuickFree bytes 10280878  (9.8 MB)
> 56 Page buffers of 12549 : 4 buffers of 1568.
> 2 Large buffers of 784 : 18 XLarge buffers of 98.
> 66992 buffers free: 81083 hiAlloc buffers: 1903 current buffers.
> 28969 units of 56 bytes hiAlloc: 1532 units of 104 bytes hiCur.
>
> Look at the second line of the displays. It appears that with region=1280M
> the "Freeheld bytes" buffer was WAY under allocated. Only 145K was
> allocated. With the region size set to 512M 9.8MB was allocated to the
> buffer and TSM is running significantly better. Whether or not this will
> help someone else I do not know. This is the first I've heard that REDUCING
> region size will help performance. It is counter-intuitive. I had been
> increasing it slowly over a period of time based on information I had found
> on ADSM.ORG. It's hard to argue with results however. My maintenance cycle
> is currently around 3 hours further along today than it usually is.
>
>      Take care,
>          Al
>