I get that error regularly on a HP dl380 gen 3 storage node running
fedora core 1 with kernels 2.4.22 and 2.4.32 with the LSI ultra 320 mpt
scsi controller. My backups and restores work fine and the error occurs
for all 4 drives. I tried stinit but without tapes in drives so it didnt
actually do anything per your last round of emails. My library is
qualstar SAIT and I do not get scsi bus crashes. It runs quite nicely
except for one drive that goes into service mode a lot.
Chuck
On Thu, 2005-12-08 at 20:42 +0100, Oscar Olsson wrote:
> On Thu, 8 Dec 2005, Robert Maiello wrote:
>
> RM> Does your Operton box have 2 gigabit NICS? Can linux and the box
> RM> drive 2 of them close to the max? Your utilitization implies that
> RM> it can. Very interesting. Can the PCI bus and backplane on this
> RM> box handle more traffic.
>
> Linux does support etherchannel, and when running with one full interface,
> the box is about half-busy, which shows that it should probably be able to
> feed drives from data that is comming from two full gig nics without any
> problem. Both NICs and the HBA is on a PCI-X 133MHz bus, so I believe it
> should be able to cope with the data volume without any problem.
>
> However, we've been experiencing intermittent SCSI subsystem crashes.
> We're running the standard 2.6 based Suse EL kernel, now with the default
> drivers/modules that are shipped with SuSE, so we can at least get support
> from Novell if this happens again (we reinstalled the box earlier today).
>
> Apart from oopses, we also sometimes see the following in the log. Keep in
> mind that this is very infrequent, and may not be the cause, but its still
> interesting. Any feedback of the cause would be highly appreciated:
>
> st3: Failed to read 65536 byte block with 32768 byte transfer.
>
> All drives are set to 64KB block size.
>
> RM> I just went through this with SUN. Solaris 8 is bottlnecked at 1Gps
> RM> period due the streams queue. Upgrading to Solaris 9 and using a V880,
> RM> I can get get about 700-800Mbps out of each NIC while they run together.
> RM> Finally with Solaris 9, we get past the 1Gbps barrier.
> RM>
> RM> Of course the CPU utilization when driving the 1 or 2 NICs at full speed
> RM> is quite high. I can see how the V440 reaches a limit here.
> RM>
> RM> Please post more Linux numbers for us.
>
> I'll keep you and the list posted as we learn more. Except for this little
> bug, which will probably be rather difficult to solve, since Networker
> only support ancient distributions, platforms and kernel versions. But I
> think this might be kernel related anyway, and not directly related to
> networker.
>
> I'll paste the last crash below as well in order to check if anyone has
> any specifics on what's going on and the cause:
>
> Unable to handle kernel NULL pointer dereference at 00000000000001d0 RIP:
> <ffffffffa0000467>{:scsi_mod:scsi_finish_command+167}
> PML4 bcbf3067 PGD bb6df067 PMD 0
> Oops: 0000 [1] SMP
> CPU 1
> Pid: 2047, comm: scsi_eh_0 Tainted: G U (2.6.5-7.201-smp
> SLES9_SP2_BRANCH-200508250620450000)
> RIP: 0010:[<ffffffffa0000467>]
> <ffffffffa0000467>{:scsi_mod:scsi_finish_command+167}
> RSP: 0018:00000100f49b5e38 EFLAGS: 00010206
> RAX: 0000000000000000 RBX: 00000100bf768000 RCX: 0000000000000000
> RDX: 00000100f4d64e00 RSI: 00000100bb579470 RDI: 00000100f4fcc000
> RBP: 00000100bb579380 R08: 00000000000003e7 R09: 00000100bb5794a0
> R10: 00000000000493e0 R11: 0000000000002710 R12: 00000100f4fcc000
> R13: 00000100bf768000 R14: 00000100f56c3800 R15: 00000100f49b5ec8
> FS: 0000002a9588e6e0(0000) GS:ffffffff80562f00(0000)
> knlGS:00000000556952a0
> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00000000000001d0 CR3: 00000000bff82000 CR4: 00000000000006e0
> Process scsi_eh_0 (pid: 2047, threadinfo 00000100f49b4000, task
> 00000100bfd12a00)
> Stack: 00000100bb579380 00000100f49b5eb8 00000100f49b5eb8 ffffffffa00038db
> 00000100f49b5ec8 00000100bb579380 0000000000004628 ffffffffa000505c
> 00000001bfd12a00 ffffffff803ddf50
> Call Trace:<ffffffffa00038db>{:scsi_mod:scsi_eh_flush_done_q+219}
> <ffffffffa000505c>{:scsi_mod:scsi_error_handler+1884}
> <ffffffff80140f10>{do_exit+3440} <ffffffff801112f7>{child_rip+8}
> <ffffffffa0004900>{:scsi_mod:scsi_error_handler+0}
> <ffffffff801112ef>{child_rip+0}
>
> Code: 8b 81 d0 01 00 00 85 c0 89 42 04 74 1c 48 8d 7a 08 48 8d b1
> RIP <ffffffffa0000467>{:scsi_mod:scsi_finish_command+167} RSP
> <00000100f49b5e38>
> CR2: 00000000000001d0
>
> //Oscar
>
> To sign off this list, send email to listserv AT listserv.temple DOT edu and
> type "signoff networker" in the
> body of the email. Please write to networker-request AT listserv.temple DOT
> edu if you have any problems
> wit this list. You can access the archives at
> http://listserv.temple.edu/archives/networker.html or
> via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER
To sign off this list, send email to listserv AT listserv.temple DOT edu and
type "signoff networker" in the
body of the email. Please write to networker-request AT listserv.temple DOT edu
if you have any problems
wit this list. You can access the archives at
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER
|