Networker

Re: [Networker] Parallelism???

2005-06-24 12:58:33
Subject: Re: [Networker] Parallelism???
From: Eric Wagar <eric AT DEADHOOKERS DOT ORG>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Fri, 24 Jun 2005 09:55:16 -0700
Oscar Olsson wrote:
On Thu, 23 Jun 2005, Eric Wagar wrote:

EW> Depending on the version of NetWorker will depend on how many parallel save
EW> streams you can run.  For us, we have the Network edition, so we get 64.
EW> With a library with 10 drives, we use all 64 streams on a save.  We, like
EW> others, can only use one stream on a tape based recover.

What server do you have that can manage that? What aggregated throughput speed to you get?

We have a Sun V440, with 4 1281MHz Ultrasparc IIIi CPUs with 1MB cache, and that server can handle just about 1gbit of throughput, before the kernel just can't get more CPU time. We're using the built-in ce (Cassini) NICs. It seems like the excessive CPU useage is caused by network processing. We can't use jumbo frames, since some network equimpent, that the clients are attached to doesn't support it.

To me, the excessive amount of CPU used to produce 1gbit of network throughput seems just plain wrong, since I seem to remember that the ce chipset has TCP checksum acceleration? Or are there any faster NICs in this regard, how about the bge chipset? Or would a better CPU (same speed, but larger cache, and yes I know I need to get a new server then :) ) with a larger cache do the trick, since we see a lot of context switching going on?

The funny thing is that if I stream data to the drives alone, without reading the data from the network, it takes very little kernel CPU.
I cheat. I have a single module 8 cpu (400MHZ) 8G RAM SGI origin 2000. I have 10 STK FC LTO2 drives through a SAN. Most drives are zoned to one 1G HBA. (I think two/three HBA's are doubled.) My /nsr filesystem is zoned through a 2G HBA. I have three gigE cards and a few 100BT cards. I have the five nonrouted backup networks in addition to the systems' main interface. There is a gigE on the main interface, and one each on the next two most used backup nets. The remaining three are on 100BT connections, and are not that heavily used.

With the current sizing guides, we have one too many gigE cards for a single module. Since I work for SGI, I can easily come by another O2K module. I can then easily increase my performance (hopefully!)

Watching last nights pcp output, the main interface is still the one that gets hit the hardest. That data mostly comes from Windows clients, and the interface is mostly pegged at max throughput. The highest cpu utilization is 90-95%, and that is only for about 20 minutes.

Our aggregate throughput looks to be about 110MB/s with all interfaces being used. The majority of my Unix clients are using a backup network, whereas most of the Windows clients are not. Our migration to Windows 2003 will help to change that, where I can force those clients onto specific backup networks.

Our concern has never been system performance. It has been backup window time then recovery time. From there, it would be our Exchange backups and recoveries. The recovery speed will be even more important when the Exchange 2003 servers are implemented. The recovers are serial, so we will then start using DBO with about 4-6TB of space (we haven't gotten too much into sizing that yet.)

Regards
eric

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listserv.temple DOT edu or visit the list's Web site at
http://listserv.temple.edu/archives/networker.html where you can
also view and post messages to the list. Questions regarding this list
should be sent to stan AT temple DOT edu
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

<Prev in Thread] Current Thread [Next in Thread>