Re: [Networker] Compare Netoworker to other backup products

On Sun, 30 Oct 2005, Matthew Huff wrote:

MH> This is a pet peeve of mine. 100MB auto-negotiation isn't a "protocol"
MH> like PPP. In PPP, both sides negotiate capabilities. 100MB
MH> auto-negotiation isn't like that at all. What happens is when the link
MH> becomes active, both sides "listen in" to the traffic and make an
MH> educated guess as to what the other side is speaking. One of the
MH> problems with this is that Cisco and other switches have spanning-tree
MH> protocols that block outgoing traffic until the spanning-tree BDU fails
MH> to show up on another port. 

Spanning tree is a Good Thing[tm]. It prevents clueless users from 
connecting a hub or switch that isn't spanning-tree aware and thus create 
a lopp, which in turn can create network downtime. Ofcouse, junk switches 
from D-link have a problem handling this anyway, but that's another story. 
I haven't seen this problem ever, so I guess you're wrong, although I 
can't prove it. ;) Lets just say that its VERY uncommon.

MH> This blocking prevents the client card from seeing any traffic at all
MH> during the window and it really is making a "guess". Even disabling the
MH> spanning-tree blocking (set spantree portfast enable) isn't enough. You
MH> should disable etherchannel and trunk negotiation as well (catalyst have
MH> a macro - "set port host"). Even given all that, if a random broadcast
MH> packet such as an arp, etc... doesn't go over the wire during the short
MH> period during the initialization, both sides will have no choice but to
MH> make a blind guess. This is why sometimes it will work and fail other
MH> times. Later IOS and Catalyst version and newer NIC firmwares are better
MH> at the guess, but it will probably be someone who puts an old ethernet
MH> card into a new server that gets you.

No, spanning tree, trunk auto negotiation mode, PoE negotiation etc does 
not block the negotiation in any way. And no, auto negotiation is not 
about listening for traffic, both parties present a list of which mode 
they can operate in.

MH> Why take the risk? 

Because the risk of broken configurations if you disable it is so much 
larger than the risk of incompatible NICs and switches.

MH> hardware flowcontrol as well). BTW, everyone has made sure what there
MH> gigabit ethernet flow control is set on their Legato server, right? :)

Ofcourse.

MH> Network performance can be tricky and non-obvious. Sending terrabytes of
MH> data over the wire for backups can be tricky and sometimes even with
MH> gigabit ethernet require tuning. For example, we have a dedicated VLAN
MH> setup for our NDMP NAS backups with jumbo frames turned on, and tcp
MH> tweakings suchs as size of tcp buffers and window scaling forced on.
MH> Legato should be setting the window scaling when it opens the tcp
MH> sockets, but doesn't appear to be, however you can tune settings to make
MH> sure it is set for all sockets. This means bigger TCP flow control
MH> windows which is important in high-volume, high-speed data streams such
MH> as during backups.
MH> 
MH> Here is my startup script on my Sun V490 running Solairs 9:
MH> 
MH> #!/bin/sh
MH> 
MH> PATH=/bin:/usr/sbin:/usr/sbin;export PATH
MH> 
MH> ndd -set /dev/tcp tcp_max_buf    4194304
MH> ndd -set /dev/tcp tcp_cwnd_max   2097152
MH> ndd -set /dev/tcp tcp_xmit_hiwat 1048576
MH> ndd -set /dev/tcp tcp_recv_hiwat 1048576
MH> ndd -set /dev/tcp tcp_tstamp_always 1
MH> ndd -set /dev/tcp tcp_wscale_always 1
MH> 
MH> 
MH> I haven't verified for sure that the TCP timestamp (tcp_tstamp_always)
MH> is necessary; i.e, protecting from sequence number wrapping, but it's
MH> usually recommended for the same type of data streams.

In Solaris SPARC, on Sun V440 servers with the Cassini chip, I've noticed 
that TCP tuning isn't the biggest performance problem. Instead, the driver 
or the NIC seems to eat up all of the CPU when there is high network 
throughput, probably to TCP checksum calculations, but that hasn't been 
confirmed. My suggestion is to run the storage nodes on Linux x64 servers 
from, for instance, HP. That way you eliminate both Slowlaris and Sun V40z 
servers, which both have their own set of quality problems. Regarding the 
performance of the Cassini NIC, that has been covered in detail in a 
previous thread.

//Oscar

To sign off this list, send email to listserv AT listserv.temple DOT edu and 
type "signoff networker" in the
body of the email. Please write to networker-request AT listserv.temple DOT edu 
if you have any problems
wit this list. You can access the archives at 
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER