--0-308640461-1085000693=:47647
Content-Type: text/plain; charset=us-ascii
you dont see any errors on the switch ports involved? anything new get added
to the switch or any rezoning done to cause things to go bad? no major changes
you can think of that could have impacted your sso env?
Steve Mickeler <steve AT warning DOT ca> wrote:
firmware is up to date on the switches. drivers are up to date for solaris
and the JNI HBA's
media server up time is 4 days
The only thing out of date is NBU which is at 4.5 FP5. I suppose I could
try going to FP6 but this setup used to be very trouble-free from the time
we installed it in October 2003.
/var/adm/messages shows
May 19 01:21:15 harbor scsi: [ID 107833 kern.warning] WARNING:
/pci@1f,4000/JNI,FCR@2,1/st@b,0 (st31):
May 19 01:21:15 harbor SCSI transport failed: reason 'timeout': giving up
May 19 06:49:57 harbor scsi: [ID 107833 kern.warning] WARNING:
/pci@1f,4000/JNI,FCR@2,1/st@a,0 (st32):
May 19 06:49:57 harbor SCSI transport failed: reason 'timeout': giving up
May 19 10:54:39 harbor scsi: [ID 107833 kern.warning] WARNING:
/pci@1f,4000/JNI,FCR@4,1/st@c,0 (st30):
May 19 10:54:39 harbor SCSI transport failed: reason 'timeout': giving up
May 19 11:31:47 harbor scsi: [ID 107833 kern.warning] WARNING:
/pci@1f,4000/JNI,FCR@2,1/st@b,0 (st31):
May 19 11:31:47 harbor SCSI transport failed: reason 'timeout': giving up
On Wed, 19 May 2004, K Chapman wrote:
> is your fibre switch showing many errors on your hba and tape drive
> ports? drivers/firmware up to date for the hba's, switches, etc?.
> whats the up time on this media server? seems like the drives are fine
> as you have a grand total of 4 hard/soft errors across youre drives..
> all four errors are soft errors which are 'recoverable'.
>
>
> our 84's were due to hard tape errors... syslog should show some type
> of error related to the transport error... you probably can check with
> hba and tape vendors about the error codes returned
>
> Steve Mickeler wrote:
>
> netstat -k shows:
>
> st32,err:
> Soft Errors 0 Hard Errors 0 Transport Errors 5 Vendor IBM
> Product ULTRIUM-TD2 Revision Revision 38D0 Serial No
>
> st31,err:
> Soft Errors 2 Hard Errors 0 Transport Errors 25 Vendor IBM
> Product ULTRIUM-TD2 Revision Revision 38D0 Serial No
>
> st30,err:
> Soft Errors 2 Hard Errors 0 Transport Errors 21 Vendor IBM
> Product ULTRIUM-TD2 Revision Revision 38D0 Serial No
>
> st33,err:
> Soft Errors 0 Hard Errors 0 Transport Errors 10 Vendor IBM
> Product ULTRIUM-TD2 Revision Revision 38D0 Serial No
>
> # modinfo | grep tape
> 51 78234000 12d04 33 1 st (SCSI tape Driver 1.216)
>
> I'm also seeing Transport Errors on all the drives on all the SSO boxes.
>
>
> On Wed, 19 May 2004, K Chapman wrote:
>
> > its trying to write the end of file mark and fails... if this is
> > solaris, you can do netstat -k and look for st,
> > it will show you error counts on the drive... you can also look in the
> > syslog, it should show the driver returning the write error along with
> > the other info. your getting errors across all your drives and with
> > diff tapes... we had something similar and it turned out to be really
> > bad drives (and drive tech, damn exabyte)
> >
> > Steve Mickeler wrote:
> > Ive been experiencing quite a few error 84 (media write error) lately.
> >
> > A job will start running and then I'll notice in the activity monitor that
> > the "KB per Second" number for some of the streams shows the same number
> > ie: 17804 at which point I know that those jobs are going to fail and I'll
> > end up with an error 84 for those streams. The job will then start again,
> > sometimes using the same media, sometimes a new media, but it will
> > generally succeed the second time.
> >
> > Any ideas as to what is causing the first job to fail ?
> >
> >
> >
> > from the bptm log:
> >
> >
> > 00:03:49.218 [7267] <2> io_ioctl: command (0)MTWEOF 1 from (bptm.c.15845)
> > on drive index 2
> >
> > 00:21:49.411 [7267] <16> io_ioctl: ioctl (MTWEOF) failed on media id
> > 000041, drive index 2, I/O error (bptm.c.15845)
> >
> > 00:21:49.412 [7267] <2> log_media_error: successfully wrote to error file
> > - 05/19/04 00:21:49 000041 2 WRITE_ERROR
> >
> > 00:21:49.412 [7267] <2> check_error_history: called from bptm line 15869,
> > EXIT_Status = 84
> >
> > 00:21:49.979 [7267] <2> check_error_history: drive index = 2, media id =
> > 000041, time = 05/19/04 00:21:49, both_match = 0, media_match = 1,
> > drive_match = 0
> >
> > ---------------------------------------------------------------------------------
> >
> > 00:31:45.025 [11412] <2> io_ioctl: command (0)MTWEOF 1 from (bptm.c.15845)
> > on drive index 0
> >
> > 00:49:45.221 [11412] <16> io_ioctl: ioctl (MTWEOF) failed on media id
> > 000041, drive index 0, I/O error (bptm.c.15845)
> >
> > 00:49:45.222 [11412] <2> log_media_error: successfully wrote to error file
> > - 05/19/04 00:49:45 000041 0 WRITE_ERROR
> >
> > 00:49:45.222 [11412] <2> check_error_history: called from bptm line 15869,
> > EXIT_Status = 84
> >
> > 00:49:45.786 [11412] <2> check_error_history: drive index = 0, media id =
> > 000041, time = 05/19/04 00:49:45, both_match = 0, media_match = 2,
> > drive_match = 0
> >
> > 00:49:45.992 [11412] <8> check_error_history: FREEZING media id 000041, it
> > has had at least 3 errors in the last 12 hour(s)
> >
> > ---------------------------------------------------------------------------------
> >
> > 00:36:44.063 [15640] <2> io_ioctl: command (2)MTBSF 1 from (bptm.c.17369)
> > on drive index 2
> >
> > 00:36:44.071 [15640] <2> io_ioctl: command (0)MTWEOF 1 from (bptm.c.17395)
> > on drive index 2
> >
> > 00:54:44.272 [15640] <16> io_ioctl: ioctl (MTWEOF) failed on media id
> > 000079, drive index 2, I/O error (bptm.c.17395)
> >
> > 00:54:44.274 [15640] <2> log_media_error: successfully wrote to error file
> > - 05/19/04 00:54:44 000079 2 WRITE_ERROR
> >
> > ---------------------------------------------------------------------------------
> >
> > 01:03:15.049 [19343] <2> io_ioctl: command (0)MTWEOF 1 from (bptm.c.15845)
> > on drive index 1
> >
> > 01:21:15.243 [19343] <16> io_ioctl: ioctl (MTWEOF) failed on media id
> > 000017, drive index 1, I/O error (bptm.c.15845)
> >
> > 01:21:15.244 [19343] <2> log_media_error: successfully wrote to error file
> > - 05/19/04 01:21:15 000017 1 WRITE_ERROR
> >
> > 01:21:15.244 [19343] <2> check_error_history: called from bptm line 15869,
> > EXIT_Status = 84
> >
> > 01:21:15.813 [19343] <2> check_error_history: drive index = 1, media id =
> > 000017, time = 05/19/04 01:21:15, both_match = 0, media_match = 0,
> > drive_match = 2
> >
> > 01:22:20.672 [19343] <8> check_error_history: DOWN'ing drive index 1, it
> > has had at least 3 errors in last 12 hour(s)
> >
> >
> > ---------------------------------------------------------------------------------
> >
> > 06:31:57.233 [13507] <2> io_ioctl: command (0)MTWEOF 1 from (bptm.c.15845)
> > on drive index 0
> >
> > 06:49:57.432 [13507] <16> io_ioctl: ioctl (MTWEOF) failed on media id
> > 000033, drive index 0, I/O error (bptm.c.15845)
> >
> > 06:49:57.433 [13507] <2> log_media_error: successfully wrote to error file
> > - 05/19/04 06:49:57 000033 0 WRITE_ERROR
> >
> > 06:49:57.433 [13507] <2> check_error_history: called from bptm line 15869,
> > EXIT_Status = 84
> >
> > 06:49:58.006 [13507] <2> check_error_history: drive index = 0, media id =
> > 000033, time = 05/19/04 06:49:57, both_match = 0, media_match = 0,
> > drive_match = 1
> >
> >
> > _______________________________________________
> > Veritas-bu maillist - Veritas-bu AT mailman.eng.auburn DOT edu
> > http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
> >
> >
> > aaarrrggghhh!!!!
> > FreeBSD rocks
> >
> > ---------------------------------
> > Do you Yahoo!?
> > SBC Yahoo! - Internet access at a great low price.
>
> aaarrrggghhh!!!!
> FreeBSD rocks
>
> ---------------------------------
> Do you Yahoo!?
> SBC Yahoo! - Internet access at a great low price.
_______________________________________________
Veritas-bu maillist - Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
aaarrrggghhh!!!!
FreeBSD rocks
---------------------------------
Do you Yahoo!?
SBC Yahoo! - Internet access at a great low price.
--0-308640461-1085000693=:47647
Content-Type: text/html; charset=us-ascii
<DIV>you dont see any errors on the switch ports involved? anything new
get added to the switch or any rezoning done to cause things to go bad?
no major changes you can think of that could have impacted your sso
env?<BR><BR><B><I>Steve Mickeler <steve AT warning DOT ca></I></B> wrote:
<BLOCKQUOTE class=replbq style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px;
BORDER-LEFT: #1010ff 2px solid"><BR>firmware is up to date on the switches.
drivers are up to date for solaris<BR>and the JNI HBA's<BR><BR>media server up
time is 4 days<BR><BR>The only thing out of date is NBU which is at 4.5 FP5. I
suppose I could<BR>try going to FP6 but this setup used to be very trouble-free
from the time<BR>we installed it in October 2003.<BR><BR>/var/adm/messages
shows<BR><BR>May 19 01:21:15 harbor scsi: [ID 107833 kern.warning]
WARNING:<BR>/pci@1f,4000/JNI,FCR@2,1/st@b,0 (st31):<BR>May 19 01:21:15 harbor
SCSI transport failed: reason 'timeout': giving up<BR><BR>May 19 06:49:57
harbor scsi: [ID 107833 kern.warning]
WARNING:<BR>/pci@1f,4000/JNI,FCR@2,1/st@a,0 (st32):<BR>May 19 06:49:57 harbor
SCSI transport failed: reason 'timeout': giving up<BR><BR>May 19 10:54:39
harbor scsi: [ID 107833 kern.warning]
WARNING:<BR>/pci@1f,4000/JNI,FCR@4,1/st@c,0 (st30):<BR>May 19 10:54:39 harbor
SCSI transport
failed: reason 'timeout': giving up<BR><BR>May 19 11:31:47 harbor scsi: [ID
107833 kern.warning] WARNING:<BR>/pci@1f,4000/JNI,FCR@2,1/st@b,0 (st31):<BR>May
19 11:31:47 harbor SCSI transport failed: reason 'timeout': giving
up<BR><BR><BR>On Wed, 19 May 2004, K Chapman wrote:<BR><BR>> is your fibre
switch showing many errors on your hba and tape drive<BR>> ports?
drivers/firmware up to date for the hba's, switches, etc?.<BR>> whats the up
time on this media server? seems like the drives are fine<BR>> as you have a
grand total of 4 hard/soft errors across youre drives..<BR>> all four errors
are soft errors which are 'recoverable'.<BR>><BR>><BR>> our 84's were
due to hard tape errors... syslog should show some type<BR>> of error
related to the transport error... you probably can check with<BR>> hba and
tape vendors about the error codes returned<BR>><BR>> Steve Mickeler
<STEVE AT WARNING DOT CA>wrote:<BR>><BR>> netstat -k
shows:<BR>><BR>>
st32,err:<BR>> Soft Errors 0 Hard Errors 0 Transport Errors 5 Vendor
IBM<BR>> Product ULTRIUM-TD2 Revision Revision 38D0 Serial
No<BR>><BR>> st31,err:<BR>> Soft Errors 2 Hard Errors 0 Transport
Errors 25 Vendor IBM<BR>> Product ULTRIUM-TD2 Revision Revision 38D0 Serial
No<BR>><BR>> st30,err:<BR>> Soft Errors 2 Hard Errors 0 Transport
Errors 21 Vendor IBM<BR>> Product ULTRIUM-TD2 Revision Revision 38D0 Serial
No<BR>><BR>> st33,err:<BR>> Soft Errors 0 Hard Errors 0 Transport
Errors 10 Vendor IBM<BR>> Product ULTRIUM-TD2 Revision Revision 38D0 Serial
No<BR>><BR>> # modinfo | grep tape<BR>> 51 78234000 12d04 33 1 st
(SCSI tape Driver 1.216)<BR>><BR>> I'm also seeing Transport Errors on
all the drives on all the SSO boxes.<BR>><BR>><BR>> On Wed, 19 May
2004, K Chapman wrote:<BR>><BR>> > its trying to write the end of file
mark and fails... if this is<BR>> > solaris, you can do netstat -k and
look for
st,<BR>> > it will show you error counts on the drive... you can also
look in the<BR>> > syslog, it should show the driver returning the write
error along with<BR>> > the other info. your getting errors across all
your drives and with<BR>> > diff tapes... we had something similar and it
turned out to be really<BR>> > bad drives (and drive tech, damn
exabyte)<BR>> ><BR>> > Steve Mickeler wrote:<BR>> > Ive been
experiencing quite a few error 84 (media write error) lately.<BR>>
><BR>> > A job will start running and then I'll notice in the activity
monitor that<BR>> > the "KB per Second" number for some of the streams
shows the same number<BR>> > ie: 17804 at which point I know that those
jobs are going to fail and I'll<BR>> > end up with an error 84 for those
streams. The job will then start again,<BR>> > sometimes using the same
media, sometimes a new media, but it will<BR>> > generally succeed
the second time.<BR>> ><BR>> > Any ideas as to what is causing the
first job to fail ?<BR>> ><BR>> ><BR>> ><BR>> > from
the bptm log:<BR>> ><BR>> ><BR>> > 00:03:49.218 [7267]
<2> io_ioctl: command (0)MTWEOF 1 from (bptm.c.15845)<BR>> > on
drive index 2<BR>> ><BR>> > 00:21:49.411 [7267] <16>
io_ioctl: ioctl (MTWEOF) failed on media id<BR>> > 000041, drive index 2,
I/O error (bptm.c.15845)<BR>> ><BR>> > 00:21:49.412 [7267]
<2> log_media_error: successfully wrote to error file<BR>> > -
05/19/04 00:21:49 000041 2 WRITE_ERROR<BR>> ><BR>> > 00:21:49.412
[7267] <2> check_error_history: called from bptm line 15869,<BR>> >
EXIT_Status = 84<BR>> ><BR>> > 00:21:49.979 [7267] <2>
check_error_history: drive index = 2, media id =<BR>> > 000041, time =
05/19/04 00:21:49, both_match = 0, media_match = 1,<BR>> > drive_match =
0<BR>> ><BR>> >
---------------------------------------------------------------------------------<BR>>
><BR>> > 00:31:45.025 [11412] <2> io_ioctl: command (0)MTWEOF 1
from (bptm.c.15845)<BR>> > on drive index 0<BR>> ><BR>> >
00:49:45.221 [11412] <16> io_ioctl: ioctl (MTWEOF) failed on media
id<BR>> > 000041, drive index 0, I/O error (bptm.c.15845)<BR>>
><BR>> > 00:49:45.222 [11412] <2> log_media_error: successfully
wrote to error file<BR>> > - 05/19/04 00:49:45 000041 0
WRITE_ERROR<BR>> ><BR>> > 00:49:45.222 [11412] <2>
check_error_history: called from bptm line 15869,<BR>> > EXIT_Status =
84<BR>> ><BR>> > 00:49:45.786 [11412] <2>
check_error_history: drive index = 0, media id =<BR>> > 000041, time =
05/19/04 00:49:45, both_match = 0, media_match = 2,<BR>> > drive_match =
0<BR>> ><BR>> > 00:49:45.992 [11412] <8>
check_error_history: FREEZING media id 000041, it<BR>> > has had at
least 3 errors in the last 12 hour(s)<BR>> ><BR>> >
---------------------------------------------------------------------------------<BR>>
><BR>> > 00:36:44.063 [15640] <2> io_ioctl: command (2)MTBSF 1
from (bptm.c.17369)<BR>> > on drive index 2<BR>> ><BR>> >
00:36:44.071 [15640] <2> io_ioctl: command (0)MTWEOF 1 from
(bptm.c.17395)<BR>> > on drive index 2<BR>> ><BR>> >
00:54:44.272 [15640] <16> io_ioctl: ioctl (MTWEOF) failed on media
id<BR>> > 000079, drive index 2, I/O error (bptm.c.17395)<BR>>
><BR>> > 00:54:44.274 [15640] <2> log_media_error: successfully
wrote to error file<BR>> > - 05/19/04 00:54:44 000079 2
WRITE_ERROR<BR>> ><BR>> >
---------------------------------------------------------------------------------<BR>>
><BR>> > 01:03:15.049 [19343] <2> io_ioctl:
command (0)MTWEOF 1 from (bptm.c.15845)<BR>> > on drive index 1<BR>>
><BR>> > 01:21:15.243 [19343] <16> io_ioctl: ioctl (MTWEOF)
failed on media id<BR>> > 000017, drive index 1, I/O error
(bptm.c.15845)<BR>> ><BR>> > 01:21:15.244 [19343] <2>
log_media_error: successfully wrote to error file<BR>> > - 05/19/04
01:21:15 000017 1 WRITE_ERROR<BR>> ><BR>> > 01:21:15.244 [19343]
<2> check_error_history: called from bptm line 15869,<BR>> >
EXIT_Status = 84<BR>> ><BR>> > 01:21:15.813 [19343] <2>
check_error_history: drive index = 1, media id =<BR>> > 000017, time =
05/19/04 01:21:15, both_match = 0, media_match = 0,<BR>> > drive_match =
2<BR>> ><BR>> > 01:22:20.672 [19343] <8> check_error_history:
DOWN'ing drive index 1, it<BR>> > has had at least 3 errors in last 12
hour(s)<BR>> ><BR>> ><BR>> >
---------------------------------------------------------------------------------<BR>>
><BR>> > 06:31:57.233 [13507] <2> io_ioctl: command (0)MTWEOF 1
from (bptm.c.15845)<BR>> > on drive index 0<BR>> ><BR>> >
06:49:57.432 [13507] <16> io_ioctl: ioctl (MTWEOF) failed on media
id<BR>> > 000033, drive index 0, I/O error (bptm.c.15845)<BR>>
><BR>> > 06:49:57.433 [13507] <2> log_media_error: successfully
wrote to error file<BR>> > - 05/19/04 06:49:57 000033 0
WRITE_ERROR<BR>> ><BR>> > 06:49:57.433 [13507] <2>
check_error_history: called from bptm line 15869,<BR>> > EXIT_Status =
84<BR>> ><BR>> > 06:49:58.006 [13507] <2>
check_error_history: drive index = 0, media id =<BR>> > 000033, time =
05/19/04 06:49:57, both_match = 0, media_match = 0,<BR>> > drive_match =
1<BR>> ><BR>> ><BR>> >
_______________________________________________<BR>> >
Veritas-bu maillist - Veritas-bu AT mailman.eng.auburn DOT edu<BR>> >
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu<BR>> ><BR>>
><BR>> > aaarrrggghhh!!!!<BR>> > FreeBSD rocks<BR>>
><BR>> > ---------------------------------<BR>> > Do you
Yahoo!?<BR>> > SBC Yahoo! - Internet access at a great low
price.<BR>><BR>> aaarrrggghhh!!!!<BR>> FreeBSD rocks<BR>><BR>>
---------------------------------<BR>> Do you Yahoo!?<BR>> SBC Yahoo! -
Internet access at a great low
price.<BR>_______________________________________________<BR>Veritas-bu
maillist - Veritas-bu AT mailman.eng.auburn DOT
edu<BR>http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu</BLOCKQUOTE></DIV><BR><BR>aaarrrggghhh!!!!<br>FreeBSD
rocks<p>
<hr size=1><font face=arial size=-1>Do you Yahoo!?<br><a
href="http://pa.yahoo.com/*http://us.rd.yahoo.com/evt=24311/*http://promo.yahoo.com/sbc/">SBC
Yahoo!</a> - Internet access at a great low price.
--0-308640461-1085000693=:47647--
|