Veritas-bu

Re: [Veritas-bu] Some info on my experiences with 10GbE

2008-01-08 15:55:16
Subject: Re: [Veritas-bu] Some info on my experiences with 10GbE
From: "Peters, Devon C" <Peters.Devon AT con-way DOT com>
To: <veritas-bu AT mailman.eng.auburn DOT edu>
Date: Tue, 8 Jan 2008 12:30:16 -0800

I don't think it's the T2000's that are so special, I think it may have more to do with Sun's cards…

I recently did some 10G testing with a Sun X4100 w/ 2 Opteron 2216's (dual core, 2.4Ghz) running RHEL 4, and found that it drastically outperformed the 4-core 1Ghz T2000's.  I just swapped one of the cards from a T2000 into the X4100, and I tested sending data from the X4100->T2000, and then from the T2000->X4100.  When sending data from the X4100->T2000, the throughput was the same as T2000->T2000, but when sending data from T2000->X4100, the maximum throughput achieved was 9.8Gb/s with 16 threads.  An interesting observation is that all 4 cores on the X4100 were 50% idle when running at this rate.

Also, worth mentioning is that I did some tests between a couple 8-core 1.2Ghz T2000's, and I got them to 9.3Gb/s with 16 threads, so the 8-core 1.2Ghz T2000's definitly outperform the 4-core 1Ghz ones.  Big supprise. :)

Similar to the previous poster, iperf seemed to perform the best with a 512k buffer and 512k tcp window size, and I also saw some large fluctuations in total throughput results.  I'm guessing this is due to where the Solaris scheduler is running threads, since with mpstat I'd occasionally see 1-2 cores (8 cpu's in mpstat) at 99-100% sys, and then the other 2 cores would be 100% idle.  If the scheduler would spread the load across the cores more evenly then perhaps more throughput could be achieved.  To help smooth the results, all the numbers I've reported on the list are the average of 3 separate 5-minute long iperf runs.

Btw, I'm also finding that the single threaded performance is crap with these cards on the 1Ghz T2000's - though with the 1.2Ghz T2000's or X4100, single threaded performance was slightly better (interestingly, the 8-core T2000's consitently stumbled w/ 3 threads):


No. Threads             ________________Mbit/sec________________
                        X4100->T2000    T2000->X4100    T2000->T2000(8core/1.2Ghz)
1                       944                     2143                    1686
2                       1867                    3988                    1937
3                       2558                    4772                    1897
4                       3146                    5096                    3704
6                       4368                    8071                    5934
8                       5468                    8282                    6908
16                      6472                    9842                    9311
32                      6513                    9893                    9283


-devon


------------------------------------
Date: Mon, 7 Jan 2008 15:33:08 -0500
From: "Curtis Preston" <cpreston AT glasshouse DOT com>
Subject: Re: [Veritas-bu] Some info on my experiences with 10GbE
To: <VERITAS-BU AT mailman.eng.auburn DOT edu>
Message-ID:
        <4FBA0941CF3D9347889AA5FF23A809BE012C8BD9 AT ghmail02.glasshousetech DOT com>
Content-Type: text/plain;       charset="us-ascii"

This is another pro-T2000 report.  What makes them special?

---
W. Curtis Preston
Backup Blog @ www.backupcentral.com
VP Data Protection, GlassHouse Technologies

-----Original Message-----
From: veritas-bu-bounces AT mailman.eng.auburn DOT edu
[mailto:veritas-bu-bounces AT mailman.eng.auburn DOT edu] On Behalf Of pancamo
Sent: Friday, January 04, 2008 11:43 PM
To: VERITAS-BU AT mailman.eng.auburn DOT edu
Subject: [Veritas-bu] Some info on my experiences with 10GbE


I just started testing 2 T2000's with dual 10Gbps SUN Nics directly
connected to each other... 

I'm somewhat pissed that I'm only able to get about 658Mbps from one
thread on the 10Gbps nic while I'm able to get 938Mbps using the onboard
1Gbs nic with when using the iperf default values.  Which means in some
cases the 10Gbit nic is actually slower than the onboard 1Gbit nic

However, I was able to get 7.2 Gbps using 10 threads.


Here are some of my max results with different thread (-P) values

./iperf -c 192.168.1.2 -f m -w 512K -l 512 -P x


TCP Win Buffer  Threads Gbps
512     512     1       1.4
512     512     2       2.5
512     512     4       4.3
512     512     6       6.4
512     512     8       6.1
512     512     10      7.2
512     512     15      4.6
512     512     18      3.6
512     512     20      3
512     512     30      2.5
512     512     60      2.3


Another annoying deal was that the results from iperf were not the same
each time I ran the test.   The results were as much as 3Gbps different
from run to run.     The results should be the same for each run.





my /etc/system settings settings that I added as suggested by SUN

set ddi_msix_alloc_limit=8
set ip:ip_soft_rings_cnt=8
set ip:ip_squeue_fanout=1
set ip:tcp_squeue_wput=1
set ip:ip_squeue_bind=0
set ipge:ipge_tx_syncq=1
set ipge:ipge_bcopy_thresh = 512
set ipge:ipge_dvma_thresh = 1
set consistent_coloring=2
set pcie:pcie_aer_ce_mask=0x1




Here are the NDD settings that I found here:
http://www.sun.com/servers/coolthreads/tnb/parameters.jsp#2

ndd -set /dev/tcp tcp_conn_req_max_q 16384
ndd -set /dev/tcp tcp_conn_req_max_q0 16384
ndd -set /dev/tcp tcp_max_buf 10485760
ndd -set /dev/tcp tcp_cwnd_max 10485760
ndd -set /dev/tcp tcp_xmit_hiwat 131072
ndd -set /dev/tcp tcp_recv_hiwat 131072
ndd -set /dev/nxge0 accept_jumbo 1

I also found information here:
http://blogs.sun.com/sunay/entry/the_solaris_networking_the_magic



cpreston wrote:
> 7500 MB/s! That's the most impressive numbers I've ever seen by FAR. I
may have to take back my "10 GbE is a Lie!" blog post, and I'd be happy
to do so.
>
> Can you share things besides the T2000? For example, 
>
> what OS and patch levels are you running?
> Any IP patches?
> Any IP-specific patches?
> What ndd settings are you using?
> Is rss enabled?
>
> "Input, I need Input!"
>
> ---
> W. Curtis Preston
> Backup Blog @ www.backupcentral.com (http://www.backupcentral.com)
> VP Data Protection, GlassHouse Technologies
>
>
> From: veritas-bu-bounces AT mailman.eng.auburn DOT edu
[mailto:veritas-bu-bounces AT mailman.eng.auburn DOT edu] On Behalf Of Peters,
Devon C
> Sent: Wednesday, October 17, 2007 12:12 PM
> To: VERITAS-BU AT mailman.eng.auburn DOT edu
> Subject: [Veritas-bu] Some info on my experiences with 10GbE
>
>
> Since I've seen a little bit of talk about 10GbE on here in the past I
figured I'd share some of my experiences... 
> I've recently been testing some of Sun's dual-port 10GbE NICs on some
small T2000's (1Ghz, 4-core). I'm only using a single port on each card,
and the servers are currently directly connected to each other (waiting
for my network team to get switches and fibre in place).
> So far, I've been able to drive throughput between these two systems
to about 7500Mbit/sec using iperf. When the throughput gets this high,
all the cores/threads on the receiving T2000 become saturated and TCP
retransmits start climbing, but both systems remain quite responsive.
Since these are only 4-core T2000's, I would guess that the 6 or 8-core
T2000's (especially with 1.2Ghz or 1.4Ghz processors) should be capable
of more throughput, possibly near line speed.
> The down side achieving this high of throughput is that it requires
lots of data streams. When transmitting with a single data stream, the
most throughput I've gotten is about 1500Mbit/sec. I only got up to
7500Mbit/s when using 64 data streams... Also, the biggest gains seem to
be in the jump from 1 to 8 data streams; with 8 streams I was able to
get throughput up to 6500Mbit/sec.
> Our goal for 10GbE, is to be able to restore data from tape at a speed
of at least 2400Mbit/sec (300MB/sec). We have large daily backups
(3-4TB) that we would like to be able to restore (not backup) in a
reasonable amount of time. These restores are used to refresh our test
and development environments with current data. The actual backups are
done with array based snapshots (HDS ShadowCopy), which then get mounted
and backed up by a dedicated media server (6-core T2000). We're
currently getting about 650MB/sec of throughput with the backups (9
streams on 3 LTO3 tape drives - MPX=3 and it's very compressible data).
> Going off my iperf results, the restoring this data using 9 streams
should get us well over 2400Mbit/sec. But - we haven't installed the
cards on our media servers yet, so I have yet to see what the actual
performanee of netbackup and LTO3 over 10GbE is. I'm hopeful it'll be
close to the iperf results, but if it doesn't meet the goal then we'll
be looking at other options.
> --
> Devon Peters

_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
<Prev in Thread] Current Thread [Next in Thread>
  • Re: [Veritas-bu] Some info on my experiences with 10GbE, Peters, Devon C <=