Re: [Networker] ANALYSIS: Networker server price/performance

I get that error regularly on a HP dl380 gen 3 storage node running
fedora core 1 with kernels 2.4.22 and 2.4.32 with the LSI ultra 320 mpt
scsi controller. My backups and restores work  fine and the error occurs
for all 4 drives. I tried stinit but without tapes in drives so it didnt
actually do anything per your last round of emails. My library is
qualstar SAIT and I do not get scsi bus crashes. It runs quite nicely
except for one drive that goes into service mode a lot. 

Chuck

On Thu, 2005-12-08 at 20:42 +0100, Oscar Olsson wrote:
> On Thu, 8 Dec 2005, Robert Maiello wrote:
> 
> RM> Does your Operton box have 2 gigabit NICS?   Can linux and the box
> RM> drive 2 of them  close to the max?  Your utilitization implies that
> RM> it can.  Very interesting.  Can the PCI bus and backplane on this
> RM> box handle more traffic.
> 
> Linux does support etherchannel, and when running with one full interface, 
> the box is about half-busy, which shows that it should probably be able to 
> feed drives from data that is comming from two full gig nics without any 
> problem. Both NICs and the HBA is on a PCI-X 133MHz bus, so I believe it 
> should be able to cope with the data volume without any problem.
> 
> However, we've been experiencing intermittent SCSI subsystem crashes. 
> We're running the standard 2.6 based Suse EL kernel, now with the default 
> drivers/modules that are shipped with SuSE, so we can at least get support 
> from Novell if this happens again (we reinstalled the box earlier today).
> 
> Apart from oopses, we also sometimes see the following in the log. Keep in 
> mind that this is very infrequent, and may not be the cause, but its still 
> interesting. Any feedback of the cause would be highly appreciated:
> 
> st3: Failed to read 65536 byte block with 32768 byte transfer.
> 
> All drives are set to 64KB block size.
> 
> RM> I just went through this with SUN.  Solaris 8 is bottlnecked at 1Gps
> RM> period due the streams queue.  Upgrading to Solaris 9 and using a V880,
> RM> I can get  get about 700-800Mbps out of each NIC while they run together.
> RM> Finally with Solaris 9, we get past the 1Gbps barrier.
> RM> 
> RM> Of course the CPU utilization when driving the 1 or 2 NICs at full speed 
> RM> is quite high.  I can see how the V440 reaches a limit here.  
> RM> 
> RM> Please post more Linux numbers for us.
> 
> I'll keep you and the list posted as we learn more. Except for this little 
> bug, which will probably be rather difficult to solve, since Networker 
> only support ancient distributions, platforms and kernel versions. But I 
> think this might be kernel related anyway, and not directly related to 
> networker.
> 
> I'll paste the last crash below as well in order to check if anyone has 
> any specifics on what's going on and the cause:
> 
> Unable to handle kernel NULL pointer dereference at 00000000000001d0 RIP: 
> <ffffffffa0000467>{:scsi_mod:scsi_finish_command+167} 
> PML4 bcbf3067 PGD bb6df067 PMD 0 
> Oops: 0000 [1] SMP 
> CPU 1 
> Pid: 2047, comm: scsi_eh_0 Tainted: G   U   (2.6.5-7.201-smp 
> SLES9_SP2_BRANCH-200508250620450000) 
> RIP: 0010:[<ffffffffa0000467>] 
> <ffffffffa0000467>{:scsi_mod:scsi_finish_command+167} 
> RSP: 0018:00000100f49b5e38  EFLAGS: 00010206 
> RAX: 0000000000000000 RBX: 00000100bf768000 RCX: 0000000000000000 
> RDX: 00000100f4d64e00 RSI: 00000100bb579470 RDI: 00000100f4fcc000 
> RBP: 00000100bb579380 R08: 00000000000003e7 R09: 00000100bb5794a0 
> R10: 00000000000493e0 R11: 0000000000002710 R12: 00000100f4fcc000 
> R13: 00000100bf768000 R14: 00000100f56c3800 R15: 00000100f49b5ec8 
> FS:  0000002a9588e6e0(0000) GS:ffffffff80562f00(0000) 
> knlGS:00000000556952a0 
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b 
> CR2: 00000000000001d0 CR3: 00000000bff82000 CR4: 00000000000006e0 
> Process scsi_eh_0 (pid: 2047, threadinfo 00000100f49b4000, task 
> 00000100bfd12a00) 
> Stack: 00000100bb579380 00000100f49b5eb8 00000100f49b5eb8 ffffffffa00038db 
>        00000100f49b5ec8 00000100bb579380 0000000000004628 ffffffffa000505c 
>        00000001bfd12a00 ffffffff803ddf50 
> Call Trace:<ffffffffa00038db>{:scsi_mod:scsi_eh_flush_done_q+219} 
>        <ffffffffa000505c>{:scsi_mod:scsi_error_handler+1884} 
>        <ffffffff80140f10>{do_exit+3440} <ffffffff801112f7>{child_rip+8} 
>        <ffffffffa0004900>{:scsi_mod:scsi_error_handler+0} 
>        <ffffffff801112ef>{child_rip+0} 
> 
> Code: 8b 81 d0 01 00 00 85 c0 89 42 04 74 1c 48 8d 7a 08 48 8d b1 
> RIP <ffffffffa0000467>{:scsi_mod:scsi_finish_command+167} RSP 
> <00000100f49b5e38> 
> CR2: 00000000000001d0 
> 
> //Oscar
> 
> To sign off this list, send email to listserv AT listserv.temple DOT edu and 
> type "signoff networker" in the
> body of the email. Please write to networker-request AT listserv.temple DOT 
> edu if you have any problems
> wit this list. You can access the archives at 
> http://listserv.temple.edu/archives/networker.html or
> via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

To sign off this list, send email to listserv AT listserv.temple DOT edu and 
type "signoff networker" in the
body of the email. Please write to networker-request AT listserv.temple DOT edu 
if you have any problems
wit this list. You can access the archives at 
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER