Networker

[Networker] General sun HW performance question

2005-01-03 11:41:27
Subject: [Networker] General sun HW performance question
From: Oscar Olsson <spam1 AT QBRANCH DOT SE>
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Date: Mon, 3 Jan 2005 17:40:34 +0100
We have the following environment in brief: A sun V440 with 4 CPUs, and
8GB RAM. To this server, a SpectraLogic T950 is attached via a fibre
channel loop. We have a dual port Sun HBA adapter (SG-XPCI2FC-JF2) that
uses both ports to reach different drives in the library. It is also
running Sun Trunking Software v1.3 to connect to the network, since both
ce interfaces are bundled as a port-channel. Both interfaces are running
1000mbit/fdx. The load-balance algorithm used for outbound traffic is
ip-source-dest pair hashing. We're running Solaris 9, with latest patches
for OS, HBA etc. Some tweaks have been applied to /etc/system, such as
maxphys, number of file descriptors, maxusers etc. The HBA is on its own
PCI bus in the correct PCI slot type.

The problem is that we can't get maximum performance out of all 6 SAIT-1
drives at once. When the server transits about 900mbit of data, the CPU
load has reached a point where there are no more cycles available. At that
time, the user space processes account for approx 15% and the kernel for
approx 80%.

I'm wondering why the kernel consumes so much CPU, considering the
relatively low throughput? What part could be causing this? Is it the LUS
driver, or the HBA driver? Or something else? Can one find out?

I was thinking that maybe the HBA doesn't have any good CPU offloading
functions for handling I/O, but that's just a theory. Is there any other
way one can find out which part of the system causes such a high CPU load?
Its not user space daemons, so "top" isn't sufficient. ;)

And is it possible that it could indeed be the HBA? I mean, there is often
a TCP offloading engine on better NIC cards, and while looking at
different HBA's, it seems like those have different offloading mechanisms
as well. I've been looking mostly at Emulex cards, and they seem to differ
quite a bit when it comes to architecture.

For instance, if I have a 66MHz PCI bus, will a LP11002 still be faster
than a LP9002DC? And yes, I include lower system CPU utilization per
megabit of throughput in my definition of "faster".

(http://www.emulex.com/products/fc/index.html)

Would be good to hear other people's experience from this..

//Oscar

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list. Questions regarding this list
should be sent to stan AT temple DOT edu
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=