ADSM-L

Re: OS390 TSM Performance questions.

2003-02-14 17:38:59
Subject: Re: OS390 TSM Performance questions.
From: "Darby, Mark" <Mark.Darby AT HQ.DOE DOT GOV>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Fri, 14 Feb 2003 17:38:52 -0500
Hello, Bill.

I fully concur with your statement - i.e., that "As has been pointed out...,
unless you know how to interpret the numbers, they're just that - a bunch of
numbers.".

This tool (i.e. show memu SHORT) has a very limited and specific use.  It is
intended for "internal use only" by the TSM support staff and it is, simply
put, not a formally supported tool.  Also, as I indicated to Alan in a
private discussion with him: if improperly used, it can emit a huge volume
of output regarding memory allocation details in the server address space.
Use it at YOUR own risk.

With that said, however, it can be somewhat useful (at least, on the S/390
server platform) - as long the consumer of it fully understands their
limitations in interpreting its content (and uses "SHORT").

I know only the following about this, and I only know it from a
S/390-specific viewpoint.  However, since the server is written in C, I
would assume that (except for those specific pieces of it, if any, that are
tailor-made for the S/390 platform) some or all (or none) of the information
may pertain to any of the other platforms on which the TSM server runs.

Here's what I understand (thanks to the TSM support staff with whom I have
had the distinct pleasure of dealing over the past several years)...

During initialization (at some point), the TSM server "performs" a GETMAIN
to determine the size of the region in which it is running.  For those of
you who are interested and don't understand, the "region" parameter is used
to govern/control/limit the size of addressable virtual storage for a given
"process" (in our case, it's called an address space).  This memory-related
control point allows the MVS operating system (i.e., currently that's OS/390
and z/OS) to limit the use of virtual memory "right up front" and, thus,
defines the maximum virtual storage available to the TSM server - in toto.
I do not know how (or if) this is accomplished on any other platform.

Upon determining the region available to it, the TSM server frees the
storage obtained, thus, and uses 2% of that amount (just freed) for what
"show memu SHORT" displays as "MaxQuickFree".

...a little digression may be in order here...
During its life, the TSM server must manage its use of memory as efficiently
as possible, because inefficient memory management techniques can seriously
degrade the performance of a busy TSM server and (as I have personally
encountered with TSM on OS/390) unproductively consume CPU cycles with an
inordinate amount of GETMAIN/FREEMAIN activity.  Memory management design is
an important consideration in any application, but it can be a particularly
problematic issue for some C-language programs, and perhaps many of you
understand this critical design issue even better than I (which ain't sayin'
too much).  Generally speaking, poorly designed algorithms in code and/or
ill-configured heap and stack storage use can cause serious memory-related
performance problems.  I assume the intent of the TSM designers was to
mitigate this problem with MaxQuickFree.

MaxQuickFree is used as a threshold below which "dynamically obtained
memory" will NOT be freed, thus, preventing the inefficient memory-thrashing
that could occur if the thousands of tiny pieces of memory used by the
server were continually obtained and released, over and over again.  The
point, here, is that the server uses MaxQuickFree (at least on the S/390
platform) to maintain "internal control" over at least this amount of the
virtual storage footprint in which it runs, thus, removing any of the C
and/or operating system-specific memory-management inefficiencies from the
performance picture (at least for that portion/amount of memory applicable
to MaxQuickFree).  The server obtains the memory in question and "carves it
up" as required to fulfill its memory needs throughout its lifetime.  It
just doesn't release memory it has acquired below the amount indicated by
MaxQuickFree.

It could be that, at times, the server will still engage in the thrashing to
which I have alluded above.  This will occur when the specific memory needs
of the server simply do not fit the freeheld allocations that the server is
maintaining, so dynamic memory needs may, never-the-less, still cause some
inefficient memory management to occur.  They may not "fit" because the
memory initially allocated (up to the MaxQuickFree threshold) could consist
(for example) mainly of 12682-byte, 1585-byte, and 792-byte buffers,
whereas, the subsequent needs for memory might be composed of smaller-sized
buffers.  The memory allocated up to the point at which MaxQuickFree is
reached is the memory (and the specifically allocated sizes) that will be
"held" (i.e., not released) by the server.  Not to open yet another can of
worms, but this can be increased, by the way, if you really need to do so.
I doubt you will ever need that, but if GETMAIN/FREEMAIN activity rises to a
point where it does severely degrade performance AND you have some way of
determining that it is, indeed, the culprit of your performance problem,
call TSM support and ask them how to do that.

It is unclear at what point such memory management thrashing begins because
there are no indicators of which I am aware that one can use to determine
that, specifically, but I would say that, generally, the more stress you
place on the server (i.e., the more and more varied work you try to have the
server perform) the more likely it is to enter such a state.

I hope this helps someone, and does not frustrate too many of you who were
willing to read it but found it to be too verbose and, perhaps,
unenlightening.  Also, please forgive me if I appear to be an arrogant
know-it-all.  I do not mean to be that way - I just "come off" that way - or
so I have been told by many close and dear associates.  It's the only way I
know.

Finally, if anyone has any cause to disbelieve, or can refute anything I
have stated here, I am very much interested in any corrections, amendments,
or disputations you might wish to provide.

Kindest Regards,
Mark Darby (or, as know by my peers and close associates - Mr. Verbose - you
see why?)
(301) 903-5229

-----Original Message-----
From: Bill Kelly [mailto:kellywh AT MAIL.AUBURN DOT EDU]
Sent: Friday, February 14, 2003 8:59 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: OS390 TSM Performance questions.

Hi,

I wanted to clarify, or perhaps retract, something about the 'show memu
SHORT' command that has been mentioned in this thread.  Not surprisingly,
the numbers you get from this command vary depending on what's been going
on in the server.  Specifically, just after startup in a 512 MB region, I
get:

    MAX initial storage  536870912  (512.0 MB)
    Freeheld bytes   69409  (0.1 MB)
    MaxQuickFree bytes 10390159  (9.9 MB)
    83 Page buffers of 12683 : 0 buffers of 1585.
    0 Large buffers of 792 : 1 XLarge buffers of 99.
   61 buffers free: 148 hiAlloc buffers: 87 current buffers.
   12 units of 56 bytes hiAlloc: 12 units of 104 bytes hiCur.

A couple of hours later, after a storage pool copy has run and nightly
backups are in full swing, I get:

    MAX initial storage  536870912  (512.0 MB)
    Freeheld bytes 10397099  (9.9 MB)
    MaxQuickFree bytes 10390159  (9.9 MB)
    83 Page buffers of 12683 : 14 buffers of 1585.
    2 Large buffers of 792 : 18 XLarge buffers of 99.
   21616 buffers free: 48260 hiAlloc buffers: 13604 current buffers.
   13400 units of 104 bytes hiAlloc: 4879 units of 56 bytes hiCur.

Note that the Freeheld number, which intially looked 'bad', now looks
'good'.  As has been pointed out to me off-list, unless you know how to
interpret the numbers, they're just that - a bunch of numbers.  I
should've known better.  :-)

Regards,
Bill

Bill Kelly
Auburn University
kellywh AT mail.auburn DOT edu

<Prev in Thread] Current Thread [Next in Thread>