Booman
Well I can honestly say and offer your congrats on keeping such a large TSM DB intact performing on one server.
Trace does have a point, its written that 120GB is the unified standard along with the max 13GB recovery log - anything afterthat - its solely upto the environment.
Since you have atleast 600GB of TSM DBs if not more running against one TSM server platform - I believe you owe us a whitepaper on how you manage to keep this moving along
Seriously, For DR purposes, or TSM DB maintenance or even for load balancing; we all would agree that standing up additional platforms - not instances - would whole-heartedly keep your stability intact by sharing the wealth amongst servers. You have a since point of failure in your environment since you so far have not mentioned any DR servers available to you.
A 300GB DB to restore must be at least 18 hours of continuous data stream utilizing at least three pieces of media, if not more depending on your media type - all prone to potential bad media affects. Let alone all the time its going to take to create DB disk volumes - say another 12 hours at least.
So in the event if a disaster - not saying you will have one any time soon - I'll be under the impression it will take you one full day to recover.
Adding in multiple servers will cut that time down to at least 6 hours if you choose to elect have your TSM DB at 75GB each.
If you are looking for a White Paper - perhaps some of the other Sr members may have one or two in their own repositories. I'll look through my history and within IBM as well.
Running at 98% cache rate is good - adding in more CPU and memory will potentially decrease your numbers maybe by a percentage point or two but on the other hand they allow you to increase your tuning parms.
But look at it from the DR and potential managability aspect - are you comfortable with your environment or would adding in more servers relieve your nerves just a little bit?
Granted more platforms to patch, keep upto date firmware wise, but in the long run - totally worth it. Perhaps you'll choose to purchase another P660 since its performed so well in your environment.
Chapter Two of the Implementation Redbook paragraph of multiple TSM servers illustrates the point value of multiple servers.
Keep up the good work, I'll forward you what I find. Im sure others will chime in as well.