Networker

Re: [Networker] ANALYSIS: Networker server price/performance

2005-12-08 14:48:31
Subject: Re: [Networker] ANALYSIS: Networker server price/performance
From: Oscar Olsson <spam1 AT QBRANCH DOT SE>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Thu, 8 Dec 2005 20:42:58 +0100
On Thu, 8 Dec 2005, Robert Maiello wrote:

RM> Does your Operton box have 2 gigabit NICS?   Can linux and the box
RM> drive 2 of them  close to the max?  Your utilitization implies that
RM> it can.  Very interesting.  Can the PCI bus and backplane on this
RM> box handle more traffic.

Linux does support etherchannel, and when running with one full interface, 
the box is about half-busy, which shows that it should probably be able to 
feed drives from data that is comming from two full gig nics without any 
problem. Both NICs and the HBA is on a PCI-X 133MHz bus, so I believe it 
should be able to cope with the data volume without any problem.

However, we've been experiencing intermittent SCSI subsystem crashes. 
We're running the standard 2.6 based Suse EL kernel, now with the default 
drivers/modules that are shipped with SuSE, so we can at least get support 
from Novell if this happens again (we reinstalled the box earlier today).

Apart from oopses, we also sometimes see the following in the log. Keep in 
mind that this is very infrequent, and may not be the cause, but its still 
interesting. Any feedback of the cause would be highly appreciated:

st3: Failed to read 65536 byte block with 32768 byte transfer.

All drives are set to 64KB block size.

RM> I just went through this with SUN.  Solaris 8 is bottlnecked at 1Gps
RM> period due the streams queue.  Upgrading to Solaris 9 and using a V880,
RM> I can get  get about 700-800Mbps out of each NIC while they run together.
RM> Finally with Solaris 9, we get past the 1Gbps barrier.
RM> 
RM> Of course the CPU utilization when driving the 1 or 2 NICs at full speed 
RM> is quite high.  I can see how the V440 reaches a limit here.  
RM> 
RM> Please post more Linux numbers for us.

I'll keep you and the list posted as we learn more. Except for this little 
bug, which will probably be rather difficult to solve, since Networker 
only support ancient distributions, platforms and kernel versions. But I 
think this might be kernel related anyway, and not directly related to 
networker.

I'll paste the last crash below as well in order to check if anyone has 
any specifics on what's going on and the cause:

Unable to handle kernel NULL pointer dereference at 00000000000001d0 RIP: 
<ffffffffa0000467>{:scsi_mod:scsi_finish_command+167} 
PML4 bcbf3067 PGD bb6df067 PMD 0 
Oops: 0000 [1] SMP 
CPU 1 
Pid: 2047, comm: scsi_eh_0 Tainted: G   U   (2.6.5-7.201-smp 
SLES9_SP2_BRANCH-200508250620450000) 
RIP: 0010:[<ffffffffa0000467>] 
<ffffffffa0000467>{:scsi_mod:scsi_finish_command+167} 
RSP: 0018:00000100f49b5e38  EFLAGS: 00010206 
RAX: 0000000000000000 RBX: 00000100bf768000 RCX: 0000000000000000 
RDX: 00000100f4d64e00 RSI: 00000100bb579470 RDI: 00000100f4fcc000 
RBP: 00000100bb579380 R08: 00000000000003e7 R09: 00000100bb5794a0 
R10: 00000000000493e0 R11: 0000000000002710 R12: 00000100f4fcc000 
R13: 00000100bf768000 R14: 00000100f56c3800 R15: 00000100f49b5ec8 
FS:  0000002a9588e6e0(0000) GS:ffffffff80562f00(0000) 
knlGS:00000000556952a0 
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b 
CR2: 00000000000001d0 CR3: 00000000bff82000 CR4: 00000000000006e0 
Process scsi_eh_0 (pid: 2047, threadinfo 00000100f49b4000, task 
00000100bfd12a00) 
Stack: 00000100bb579380 00000100f49b5eb8 00000100f49b5eb8 ffffffffa00038db 
       00000100f49b5ec8 00000100bb579380 0000000000004628 ffffffffa000505c 
       00000001bfd12a00 ffffffff803ddf50 
Call Trace:<ffffffffa00038db>{:scsi_mod:scsi_eh_flush_done_q+219} 
       <ffffffffa000505c>{:scsi_mod:scsi_error_handler+1884} 
       <ffffffff80140f10>{do_exit+3440} <ffffffff801112f7>{child_rip+8} 
       <ffffffffa0004900>{:scsi_mod:scsi_error_handler+0} 
       <ffffffff801112ef>{child_rip+0} 

Code: 8b 81 d0 01 00 00 85 c0 89 42 04 74 1c 48 8d 7a 08 48 8d b1 
RIP <ffffffffa0000467>{:scsi_mod:scsi_finish_command+167} RSP 
<00000100f49b5e38> 
CR2: 00000000000001d0 

//Oscar

To sign off this list, send email to listserv AT listserv.temple DOT edu and 
type "signoff networker" in the
body of the email. Please write to networker-request AT listserv.temple DOT edu 
if you have any problems
wit this list. You can access the archives at 
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER