ADSM-L

Re: OS390 TSM Performance questions.

2003-02-14 08:19:50
Subject: Re: OS390 TSM Performance questions.
From: Rodney clark <Rodney.Clark AT INGBANK DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Fri, 14 Feb 2003 12:33:18 -0000
Post us some details iostat vmstat and how much memory disks e.t.c.
The big quick win on AIX is vmtune -p5 -P10
But I guess you a.ready know that.


-----Original Message-----
From: PAC Brion Arnaud [mailto:Arnaud.Brion AT PANALPINA DOT COM]
Sent: Friday 14 February 2003 09:44
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: OS390 TSM Performance questions.


Hi all,

I followed your discussion with much interest, as I'm suffering from
huge performance problem problem too. Unfortunately I'm not under OS390,
but using AIX 4.3.3 : could someone tell me if there is some some trick
like this one, that should be considered, when using this OS ?
Another thing that annoys me : using "show memu SHORT" on my server (TSM
4.2.3.1) returns : ANR2000E Unknown command - SHOW MEMU
Could it be that this command is only available for OS390 TSM version ?
Thanks in advance.

Arnaud
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
| Arnaud Brion, Panalpina Management Ltd., IT Group     |
| Viaduktstrasse 42, P.O. Box, 4002 Basel - Switzerland |
| Phone: +41 61 226 19 78 / Fax: +41 61 226 17 01       |
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=



-----Original Message-----
From: Alan Davenport [mailto:Alan.Davenport AT SELECTIVE DOT COM]
Sent: Thursday, 13 February, 2003 21:41
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: OS390 TSM Performance questions.


Hi Bill,

          Another thing that came up in our discussion was that the DB
buffpoolsize should not exceed 131072 (128K). You might want to try that
was well. Sounds like you have little to lose, like I did when I tried
reducing the region size. Another observation. My cache hit ratio has
gone up nearly a full percentage point after I made my adjustment this
morning. I'm fairly happy at this point but I sure wish I knew WHY this
has worked! I can see me trying to explain this to management. "I solved
the TSM performance problem!" "Really! How?" "I gave is less than half
the memory to work with!" "OK Al, just stay calm the men with the white
coats will be along shortly!" At least it would be vacation time! (:

         Al

-----Original Message-----
From: Bill Kelly [mailto:kellywh AT MAIL.AUBURN DOT EDU]
Sent: Thursday, February 13, 2003 2:49 PM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: OS390 TSM Performance questions.


Hi,

We seem to be experiencing symptoms similar (identical?) to Alan's.

We're at z/OS 1.2, running on a 2066-002 w/ 8GB of memory and virtually
no paging; TSM is at 4.2.3.0.; database is 55% of 106 GB.  Network
connectivity is via GB ethernet. Disk pool is 190GB on an ESS. Nightly
backup load is approximately 230 clients (a mix of desktops and
servers), averaging in the 130-140GB range per night total.

For some weeks now (I'm not sure when this started, but I know the
problem was there at 4.2.2.10), we've been seeing horrible performance
after TSM has been up for a few hours.  For example, I can watch 3
migration processes that run along fine for a little while, each getting
approx. 400 MB/min throughput, then suddenly CPU utilization by TSM
shoots up to 95% and throughput on the migrations drops to approx. 50
MB/min per process. Stopping and restarting the processes does no good,
but cycling the server clears up the problem.  I'm certain this problem
affects other server activities, such as client backups, storage pool
backups, etc.

Like Alan, I've been ratcheting up the region size (up to 1.5 GB) and
the db bufferpool size (up to 384 MB) in a vain attempt to help matters.

I recently resorted to cycling the server 4 times per day just to get
the performance needed to keep up with things.

Based on the comments in this thread, I last night changed our region
size to 512 MB and db bufferpool size to 128 MB.  Until now, I wasn't
aware of the 'show memu' diagnostic command (thanks Alan/Mark! I finally
have
*something* to quantify directly); here's the output from our server:

    MAX initial storage  536870912  (512.0 MB)
    Freeheld bytes   63678  (0.1 MB)
    MaxQuickFree bytes 10390159  (9.9 MB)
    83 Page buffers of 12683 : 0 buffers of 1585.
    0 Large buffers of 792 : 1 XLarge buffers of 99.
   68 buffers free: 134 hiAlloc buffers: 66 current buffers.
   12 units of 56 bytes hiAlloc: 11 units of 88 bytes hiCur.

So apparently I still have the 'tiny Freeheld' problem; I suspect
strongly I had the same trouble at 1.5 GB region size. (I don't suppose
the functions of and relationships among these buffer pools is
documented anywhere?  I haven't found anything in the list archives or
at the support web site.)  I wonder if there's a factor other than db
bufferpool size and region size that's affecting these buffer pool
allocations?

I suspect that our server performance goes south once we run out of
one/some type(s) of these buffers and the server starts
GETMAINing/FREEMAINing itself to death?

Lacking any further information, I plan to do some bouncing of our
server this weekend to see if I can come up with a region and db bufpool
combination that will get the 'Freeheld bytes' (and presumably the
'buffers free') numbers into a reasonable range.  Perhaps if I can do
that, I'll be able to stop this insane cycling of the server every 5-8
hours.

Thanks for your help and insight!
Bill

Bill Kelly
Auburn University
kellywh AT mail.auburn DOT edu

On Thu, 13 Feb 2003, Alan Davenport wrote:

> I had my region size at 1280M and
> TSM was running just awful. I had a phone conversation with Mark and
> afterwards, I tried his suggestion of REDUCING the region size. Note
> the before/after output to the "show memu SHORT" (Case sensitive!)
> display:
>
> Region Size = 1280M
>
> MAX initial storage  1342177280 (1280.0 MB)
> Freeheld bytes  145620  (0.1 MB)
> MaxQuickFree bytes 26387005  (25.2MB)
> 56 Page buffers of 32210 : 315 buffers of 4026.
> 4 Large buffers of 2013 : 222 XLarge buffers of 251.
> 202 buffers free: 336 hiAlloc buffers: 134 current buffers. 50 units
> of 688 bytes hiAlloc: 44 units of 72 bytes hiCur. Region Size=512M
>
> MAX initial storage  536870912  (512.0 MB)
> Freeheld bytes 10280787  (9.8 MB)
> MaxQuickFree bytes 10280878  (9.8 MB)
> 56 Page buffers of 12549 : 4 buffers of 1568.
> 2 Large buffers of 784 : 18 XLarge buffers of 98.
> 66992 buffers free: 81083 hiAlloc buffers: 1903 current buffers. 28969

> units of 56 bytes hiAlloc: 1532 units of 104 bytes hiCur.
>
> Look at the second line of the displays. It appears that with
> region=1280M the "Freeheld bytes" buffer was WAY under allocated. Only

> 145K was allocated. With the region size set to 512M 9.8MB was
> allocated to the buffer and TSM is running significantly better.
> Whether or not this will help someone else I do not know. This is the
> first I've heard that
REDUCING
> region size will help performance. It is counter-intuitive. I had been

> increasing it slowly over a period of time based on information I had
found
> on ADSM.ORG. It's hard to argue with results however. My maintenance
> cycle is currently around 3 hours further along today than it usually
> is.
>
>      Take care,
>          Al
>