Your postings don't relate any analysis performed at the Linux level...
Have your Linux people take a good look at that system, to get a profile
of its behavior under load, as to memory, paging, disk, network, and
processor utilization, to help narrow the problem. One sometimes
discovers anomalies in the platform, as in an undiscovered memory fault
which has resulted in half your memory DIMMs being marked offline, or a
failed processor, finally explaining why performance has seemed degraded
of late. Linux tuning knobs can also affect server performance, as can
site DNS service problems. A multi-processor Linux kernel may be
engineered to have only one processor handle all I/O, which results in
a bottleneck. I/O interrupts are the bane of system performance, and
where rogue network activity hammers a system, throughput goes down.
All this is to say that many unrealized factors can be in play.
Having been in TSM for a while, you know that more and more client and
server processes have been made multi-thread, but I think there are some
more to go.
So, get into the analysis. You can readily create load scenarios
the controlled execution of dbbackups, reclamations, expirations,
pool copies, etc. and thereby paint a picture of key affectors.