TSM Sizing Matrix

booman55

ADSM.ORG Senior Member
Joined
Feb 21, 2007
Messages
259
Reaction score
11
Points
0
Does anyone know what the recommended guidelines are when it comes to sizing the DB and what impact that has on physical server memory?

How many instances can you run on a 32GB memory box? Is there a limit? Are they're best practices? I know there is no concrete answer and there are variables involved, but where do I go to look to start sizing my environment.

The problem I'm running into is I have 5 large instances (yes they're large) running on a P660 box and it's getting 100% pegged at times. I don't want to keep throwing CPU and memory at it if it won't help.

Thanks.
 
Booman
Download and read the TSM Deployment Guide from the Rebooks library. Inside you will find the necessary formulas to assist you in sizing your environment.

To partially answer your problem - a P660 -depending on its LPAR assignments, hardware availability etc... - it should satisfy at least two maybe three TSM instances at one time without a load condition. Now this is dependent on the size if each TSM DB.
And you would be the first that I know of that has 5 running on one server.

Most of us consider standing up a new TSM server when the DB reaches 75-80% full capacity. Now if you have 5 large instances and you say they are large - then I would offer the recommendation to get another AIX server - and configure for at least two more TSM instances. And then load balance across the two platforms.

You also need to consider capacity planning or TSM tuning? A simply variable change in respect to buffer pages can ease the strain of a stressed TSM server. Also a review of your data retention requirements may also need refining as well - perhaps change is good in your case.

Hope this helps
 
Steven, I am familiar with all the best practices mentioned in that guide. What I am looking for is some hard facts behind those recommendations. For instance is it better to run 5 TSM instances with 80GB DB's or 2 instances of 200GB from a hardware and software performance standpoint and if so, why?

Right now I have a 300GB TSM DB which is running "fine" with 98.87 cache hit rate. That's one of the 5 instances I have running on the P660 server. The others are smaller, all are over 100gb. If I add another P series server to the environment is going to gain me much improvement? Or do I just add memory and cpu at the existing server?

Thanks.
 
I've been told by an ex TSM developer, Jack McGill, that with the way the DB is written 120GB is optimal. Specifically 12 volumes of 10GB. At this point a second instance or new server should be implemented. I have no hard proof of this. As you said you are doing fine at 300GB. I was also told ideally your DB PCT UTIL should be around 65%.

My DB isn't near that big so I have no performance numbers to share. :(




When you say 100% pegged are you talking CPU or Memory or I/O, or all 3? Obviously if only one is slowing you down you have something to attack.
 
Booman
Well I can honestly say and offer your congrats on keeping such a large TSM DB intact performing on one server.
Trace does have a point, its written that 120GB is the unified standard along with the max 13GB recovery log - anything afterthat - its solely upto the environment.

Since you have atleast 600GB of TSM DBs if not more running against one TSM server platform - I believe you owe us a whitepaper on how you manage to keep this moving along :) :)
Seriously, For DR purposes, or TSM DB maintenance or even for load balancing; we all would agree that standing up additional platforms - not instances - would whole-heartedly keep your stability intact by sharing the wealth amongst servers. You have a since point of failure in your environment since you so far have not mentioned any DR servers available to you.
A 300GB DB to restore must be at least 18 hours of continuous data stream utilizing at least three pieces of media, if not more depending on your media type - all prone to potential bad media affects. Let alone all the time its going to take to create DB disk volumes - say another 12 hours at least.
So in the event if a disaster - not saying you will have one any time soon - I'll be under the impression it will take you one full day to recover.
Adding in multiple servers will cut that time down to at least 6 hours if you choose to elect have your TSM DB at 75GB each.
If you are looking for a White Paper - perhaps some of the other Sr members may have one or two in their own repositories. I'll look through my history and within IBM as well.
Running at 98% cache rate is good - adding in more CPU and memory will potentially decrease your numbers maybe by a percentage point or two but on the other hand they allow you to increase your tuning parms.
But look at it from the DR and potential managability aspect - are you comfortable with your environment or would adding in more servers relieve your nerves just a little bit?
Granted more platforms to patch, keep upto date firmware wise, but in the long run - totally worth it. Perhaps you'll choose to purchase another P660 since its performed so well in your environment.
Chapter Two of the Implementation Redbook paragraph of multiple TSM servers illustrates the point value of multiple servers.

Keep up the good work, I'll forward you what I find. Im sure others will chime in as well.
 
Last edited:
Thanks for the responses. If you hadn't noticed I put running "fine" in quotes. On the surface the DB is running efficiently but I realize it's way out of whack. Our environment has grown so much so quickly that we're really on the edge. We are breaking many records that I wish we wouldn't be breaking, but that's the nature of where I work. We're mandated to save everything sometimes forever x 3. Getting this environment tamed is what keeps me going. It's an uphill battle though.
 
Back
Top