Networker

Re: [Networker] General sun HW performance question

2005-01-06 07:00:51
Subject: Re: [Networker] General sun HW performance question
From: Oscar Olsson <spam1 AT QBRANCH DOT SE>
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Date: Thu, 6 Jan 2005 13:00:00 +0100
On Wed, 5 Jan 2005, Howard Martin wrote:

HM> On Wed, 5 Jan 2005 09:35:57 +0100, Oscar Olsson <spam1 AT QBRANCH DOT SE> 
wrote:
HM> >On Tue, 4 Jan 2005, Robert Maiello wrote:
HM> >RM> The 440 has 2 internal gigabit adapters (ce's).
HM> <SNIP>
HM> >RM> Solaris 9 was suppose to have very good TCP/IP CPU utilization though?
HM> >
HM> >I read this document, and checked if I had any problems that are mentioned
HM> >in this document. I couldn't find any signs of PCI bus congestion or any
HM> >signs of other issues that are mentioned in this document.
HM> >
HM> >I think the core essence of my problem is to find out what part of the
HM> >system causes such a high load. Is it the NIC or the FC adapter? And if it
HM> >is either one, can it be replaced with a better adapter that has a better
HM> >driver and/or hardware that creates less load on the system?
HM> >
HM> >Since its a V440 server with 4 CPUs, I can't add more CPU power. But
HM> >either way, I think its pathetic if you require more CPU power than that
HM> >just to drive a gig or two of I/O.
HM> >
HM> >//Oscar
HM> >
HM> Use bigasm on the backup server to hammer your tape drives, then use it
HM> from a fast client to hammer the network this should help identify which
HM> is a bottle neck.

Interesting!

I created five directories in the root of the backup server, and then
creating a local bigadm directive for a file in each directive, making the
networker server back up 50GB in each directory. Then I created five
instances of the backup server client definition, each backing up one
directory to a different pool, thus making each saveset go to a different
drive. Then I ran these five new definitions simultaneously in five
different groups, making them write data to five different drives
simultaneously.

I now see a throughput of ~1,5gigabit, which is close to the drives native
capacity (S-AIT1), but since I'm not sure of the contents of the actual
data being written, I'm assuming that compression isn't effective on this
type of data.

Now top shows this more or less continously:

last pid:  8004;  load averages:  0.39,  0.40,  0.38 12:52:09
78 processes:  77 sleeping, 1 on cpu
CPU states: 19.3% idle,  2.5% user,  4.9% kernel, 73.3% iowait,  0.0% swap
Memory: 8192M real, 7203M free, 224M swap in use, 14G swap free

Which looks much better, since this indicates that the drives are the
bottlenecks in this case.


Now to the next question. Why does the network I/O create such a high CPU
utilization? Is it because the ce adapters lack hardware acceleration
features, or is it because the Solaris 9 IP-stack isn't very fast?

I have read the previous suggested documents about IP stack and OS tuning,
and I have applied some changes that I see fit, according to the various
sources. This has boosted performance with say ~5-10%, but I'm looking for
more like 50-100%, since that's what the drives should be able to handle
on average with live data comming from the clients.

So, is this high load by design, or are there better network adapters,
that can offload the CPU better than the ce ones, or is perhaps Solaris 10
an alternative, since SUn boasts how they have improved the IP stack? I
really don't know where to look next. I can't add more CPUs, since there
are already 4 CPUs in my V440 system. And upgrading the CPUs will probably
be too expensive, compared to the expected performance boost.

//Oscar

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list. Questions regarding this list
should be sent to stan AT temple DOT edu
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=