Veritas-bu

[Veritas-bu] error 84 - ioctl (MTWEOF) failed on media id

2004-05-19 17:04:53
Subject: [Veritas-bu] error 84 - ioctl (MTWEOF) failed on media id
From: tech2187 AT yahoo DOT com (K Chapman)
Date: Wed, 19 May 2004 14:04:53 -0700 (PDT)
--0-308640461-1085000693=:47647
Content-Type: text/plain; charset=us-ascii

you dont see any errors on the switch ports involved?  anything new get added 
to the switch or any rezoning done to cause things to go bad?  no major changes 
you can think of that could have impacted your sso env?

Steve Mickeler <steve AT warning DOT ca> wrote:
firmware is up to date on the switches. drivers are up to date for solaris
and the JNI HBA's

media server up time is 4 days

The only thing out of date is NBU which is at 4.5 FP5. I suppose I could
try going to FP6 but this setup used to be very trouble-free from the time
we installed it in October 2003.

/var/adm/messages shows

May 19 01:21:15 harbor scsi: [ID 107833 kern.warning] WARNING:
/pci@1f,4000/JNI,FCR@2,1/st@b,0 (st31):
May 19 01:21:15 harbor SCSI transport failed: reason 'timeout': giving up

May 19 06:49:57 harbor scsi: [ID 107833 kern.warning] WARNING:
/pci@1f,4000/JNI,FCR@2,1/st@a,0 (st32):
May 19 06:49:57 harbor SCSI transport failed: reason 'timeout': giving up

May 19 10:54:39 harbor scsi: [ID 107833 kern.warning] WARNING:
/pci@1f,4000/JNI,FCR@4,1/st@c,0 (st30):
May 19 10:54:39 harbor SCSI transport failed: reason 'timeout': giving up

May 19 11:31:47 harbor scsi: [ID 107833 kern.warning] WARNING:
/pci@1f,4000/JNI,FCR@2,1/st@b,0 (st31):
May 19 11:31:47 harbor SCSI transport failed: reason 'timeout': giving up


On Wed, 19 May 2004, K Chapman wrote:

> is your fibre switch showing many errors on your hba and tape drive
> ports? drivers/firmware up to date for the hba's, switches, etc?.
> whats the up time on this media server? seems like the drives are fine
> as you have a grand total of 4 hard/soft errors across youre drives..
> all four errors are soft errors which are 'recoverable'.
>
>
> our 84's were due to hard tape errors... syslog should show some type
> of error related to the transport error... you probably can check with
> hba and tape vendors about the error codes returned
>
> Steve Mickeler wrote:
>
> netstat -k shows:
>
> st32,err:
> Soft Errors 0 Hard Errors 0 Transport Errors 5 Vendor IBM
> Product ULTRIUM-TD2 Revision Revision 38D0 Serial No
>
> st31,err:
> Soft Errors 2 Hard Errors 0 Transport Errors 25 Vendor IBM
> Product ULTRIUM-TD2 Revision Revision 38D0 Serial No
>
> st30,err:
> Soft Errors 2 Hard Errors 0 Transport Errors 21 Vendor IBM
> Product ULTRIUM-TD2 Revision Revision 38D0 Serial No
>
> st33,err:
> Soft Errors 0 Hard Errors 0 Transport Errors 10 Vendor IBM
> Product ULTRIUM-TD2 Revision Revision 38D0 Serial No
>
> # modinfo | grep tape
> 51 78234000 12d04 33 1 st (SCSI tape Driver 1.216)
>
> I'm also seeing Transport Errors on all the drives on all the SSO boxes.
>
>
> On Wed, 19 May 2004, K Chapman wrote:
>
> > its trying to write the end of file mark and fails... if this is
> > solaris, you can do netstat -k and look for st,
> > it will show you error counts on the drive... you can also look in the
> > syslog, it should show the driver returning the write error along with
> > the other info. your getting errors across all your drives and with
> > diff tapes... we had something similar and it turned out to be really
> > bad drives (and drive tech, damn exabyte)
> >
> > Steve Mickeler wrote:
> > Ive been experiencing quite a few error 84 (media write error) lately.
> >
> > A job will start running and then I'll notice in the activity monitor that
> > the "KB per Second" number for some of the streams shows the same number
> > ie: 17804 at which point I know that those jobs are going to fail and I'll
> > end up with an error 84 for those streams. The job will then start again,
> > sometimes using the same media, sometimes a new media, but it will
> > generally succeed the second time.
> >
> > Any ideas as to what is causing the first job to fail ?
> >
> >
> >
> > from the bptm log:
> >
> >
> > 00:03:49.218 [7267] <2> io_ioctl: command (0)MTWEOF 1 from (bptm.c.15845)
> > on drive index 2
> >
> > 00:21:49.411 [7267] <16> io_ioctl: ioctl (MTWEOF) failed on media id
> > 000041, drive index 2, I/O error (bptm.c.15845)
> >
> > 00:21:49.412 [7267] <2> log_media_error: successfully wrote to error file
> > - 05/19/04 00:21:49 000041 2 WRITE_ERROR
> >
> > 00:21:49.412 [7267] <2> check_error_history: called from bptm line 15869,
> > EXIT_Status = 84
> >
> > 00:21:49.979 [7267] <2> check_error_history: drive index = 2, media id =
> > 000041, time = 05/19/04 00:21:49, both_match = 0, media_match = 1,
> > drive_match = 0
> >
> > ---------------------------------------------------------------------------------
> >
> > 00:31:45.025 [11412] <2> io_ioctl: command (0)MTWEOF 1 from (bptm.c.15845)
> > on drive index 0
> >
> > 00:49:45.221 [11412] <16> io_ioctl: ioctl (MTWEOF) failed on media id
> > 000041, drive index 0, I/O error (bptm.c.15845)
> >
> > 00:49:45.222 [11412] <2> log_media_error: successfully wrote to error file
> > - 05/19/04 00:49:45 000041 0 WRITE_ERROR
> >
> > 00:49:45.222 [11412] <2> check_error_history: called from bptm line 15869,
> > EXIT_Status = 84
> >
> > 00:49:45.786 [11412] <2> check_error_history: drive index = 0, media id =
> > 000041, time = 05/19/04 00:49:45, both_match = 0, media_match = 2,
> > drive_match = 0
> >
> > 00:49:45.992 [11412] <8> check_error_history: FREEZING media id 000041, it
> > has had at least 3 errors in the last 12 hour(s)
> >
> > ---------------------------------------------------------------------------------
> >
> > 00:36:44.063 [15640] <2> io_ioctl: command (2)MTBSF 1 from (bptm.c.17369)
> > on drive index 2
> >
> > 00:36:44.071 [15640] <2> io_ioctl: command (0)MTWEOF 1 from (bptm.c.17395)
> > on drive index 2
> >
> > 00:54:44.272 [15640] <16> io_ioctl: ioctl (MTWEOF) failed on media id
> > 000079, drive index 2, I/O error (bptm.c.17395)
> >
> > 00:54:44.274 [15640] <2> log_media_error: successfully wrote to error file
> > - 05/19/04 00:54:44 000079 2 WRITE_ERROR
> >
> > ---------------------------------------------------------------------------------
> >
> > 01:03:15.049 [19343] <2> io_ioctl: command (0)MTWEOF 1 from (bptm.c.15845)
> > on drive index 1
> >
> > 01:21:15.243 [19343] <16> io_ioctl: ioctl (MTWEOF) failed on media id
> > 000017, drive index 1, I/O error (bptm.c.15845)
> >
> > 01:21:15.244 [19343] <2> log_media_error: successfully wrote to error file
> > - 05/19/04 01:21:15 000017 1 WRITE_ERROR
> >
> > 01:21:15.244 [19343] <2> check_error_history: called from bptm line 15869,
> > EXIT_Status = 84
> >
> > 01:21:15.813 [19343] <2> check_error_history: drive index = 1, media id =
> > 000017, time = 05/19/04 01:21:15, both_match = 0, media_match = 0,
> > drive_match = 2
> >
> > 01:22:20.672 [19343] <8> check_error_history: DOWN'ing drive index 1, it
> > has had at least 3 errors in last 12 hour(s)
> >
> >
> > ---------------------------------------------------------------------------------
> >
> > 06:31:57.233 [13507] <2> io_ioctl: command (0)MTWEOF 1 from (bptm.c.15845)
> > on drive index 0
> >
> > 06:49:57.432 [13507] <16> io_ioctl: ioctl (MTWEOF) failed on media id
> > 000033, drive index 0, I/O error (bptm.c.15845)
> >
> > 06:49:57.433 [13507] <2> log_media_error: successfully wrote to error file
> > - 05/19/04 06:49:57 000033 0 WRITE_ERROR
> >
> > 06:49:57.433 [13507] <2> check_error_history: called from bptm line 15869,
> > EXIT_Status = 84
> >
> > 06:49:58.006 [13507] <2> check_error_history: drive index = 0, media id =
> > 000033, time = 05/19/04 06:49:57, both_match = 0, media_match = 0,
> > drive_match = 1
> >
> >
> > _______________________________________________
> > Veritas-bu maillist - Veritas-bu AT mailman.eng.auburn DOT edu
> > http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
> >
> >
> > aaarrrggghhh!!!!
> > FreeBSD rocks
> >
> > ---------------------------------
> > Do you Yahoo!?
> > SBC Yahoo! - Internet access at a great low price.
>
> aaarrrggghhh!!!!
> FreeBSD rocks
>
> ---------------------------------
> Do you Yahoo!?
> SBC Yahoo! - Internet access at a great low price.
_______________________________________________
Veritas-bu maillist - Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


aaarrrggghhh!!!!
FreeBSD rocks
                
---------------------------------
Do you Yahoo!?
SBC Yahoo! - Internet access at a great low price.
--0-308640461-1085000693=:47647
Content-Type: text/html; charset=us-ascii

<DIV>you dont see any errors on the switch ports involved?&nbsp; anything new 
get added to the switch or any rezoning done to cause things to go bad?&nbsp; 
no major changes you can think of that could have impacted your sso 
env?<BR><BR><B><I>Steve Mickeler &lt;steve AT warning DOT ca&gt;</I></B> wrote:
<BLOCKQUOTE class=replbq style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; 
BORDER-LEFT: #1010ff 2px solid"><BR>firmware is up to date on the switches. 
drivers are up to date for solaris<BR>and the JNI HBA's<BR><BR>media server up 
time is 4 days<BR><BR>The only thing out of date is NBU which is at 4.5 FP5. I 
suppose I could<BR>try going to FP6 but this setup used to be very trouble-free 
from the time<BR>we installed it in October 2003.<BR><BR>/var/adm/messages 
shows<BR><BR>May 19 01:21:15 harbor scsi: [ID 107833 kern.warning] 
WARNING:<BR>/pci@1f,4000/JNI,FCR@2,1/st@b,0 (st31):<BR>May 19 01:21:15 harbor 
SCSI transport failed: reason 'timeout': giving up<BR><BR>May 19 06:49:57 
harbor scsi: [ID 107833 kern.warning] 
WARNING:<BR>/pci@1f,4000/JNI,FCR@2,1/st@a,0 (st32):<BR>May 19 06:49:57 harbor 
SCSI transport failed: reason 'timeout': giving up<BR><BR>May 19 10:54:39 
harbor scsi: [ID 107833 kern.warning] 
WARNING:<BR>/pci@1f,4000/JNI,FCR@4,1/st@c,0 (st30):<BR>May 19 10:54:39 harbor 
SCSI transport
 failed: reason 'timeout': giving up<BR><BR>May 19 11:31:47 harbor scsi: [ID 
107833 kern.warning] WARNING:<BR>/pci@1f,4000/JNI,FCR@2,1/st@b,0 (st31):<BR>May 
19 11:31:47 harbor SCSI transport failed: reason 'timeout': giving 
up<BR><BR><BR>On Wed, 19 May 2004, K Chapman wrote:<BR><BR>&gt; is your fibre 
switch showing many errors on your hba and tape drive<BR>&gt; ports? 
drivers/firmware up to date for the hba's, switches, etc?.<BR>&gt; whats the up 
time on this media server? seems like the drives are fine<BR>&gt; as you have a 
grand total of 4 hard/soft errors across youre drives..<BR>&gt; all four errors 
are soft errors which are 'recoverable'.<BR>&gt;<BR>&gt;<BR>&gt; our 84's were 
due to hard tape errors... syslog should show some type<BR>&gt; of error 
related to the transport error... you probably can check with<BR>&gt; hba and 
tape vendors about the error codes returned<BR>&gt;<BR>&gt; Steve Mickeler 
<STEVE AT WARNING DOT CA>wrote:<BR>&gt;<BR>&gt; netstat -k 
shows:<BR>&gt;<BR>&gt;
 st32,err:<BR>&gt; Soft Errors 0 Hard Errors 0 Transport Errors 5 Vendor 
IBM<BR>&gt; Product ULTRIUM-TD2 Revision Revision 38D0 Serial 
No<BR>&gt;<BR>&gt; st31,err:<BR>&gt; Soft Errors 2 Hard Errors 0 Transport 
Errors 25 Vendor IBM<BR>&gt; Product ULTRIUM-TD2 Revision Revision 38D0 Serial 
No<BR>&gt;<BR>&gt; st30,err:<BR>&gt; Soft Errors 2 Hard Errors 0 Transport 
Errors 21 Vendor IBM<BR>&gt; Product ULTRIUM-TD2 Revision Revision 38D0 Serial 
No<BR>&gt;<BR>&gt; st33,err:<BR>&gt; Soft Errors 0 Hard Errors 0 Transport 
Errors 10 Vendor IBM<BR>&gt; Product ULTRIUM-TD2 Revision Revision 38D0 Serial 
No<BR>&gt;<BR>&gt; # modinfo | grep tape<BR>&gt; 51 78234000 12d04 33 1 st 
(SCSI tape Driver 1.216)<BR>&gt;<BR>&gt; I'm also seeing Transport Errors on 
all the drives on all the SSO boxes.<BR>&gt;<BR>&gt;<BR>&gt; On Wed, 19 May 
2004, K Chapman wrote:<BR>&gt;<BR>&gt; &gt; its trying to write the end of file 
mark and fails... if this is<BR>&gt; &gt; solaris, you can do netstat -k and 
look for
 st,<BR>&gt; &gt; it will show you error counts on the drive... you can also 
look in the<BR>&gt; &gt; syslog, it should show the driver returning the write 
error along with<BR>&gt; &gt; the other info. your getting errors across all 
your drives and with<BR>&gt; &gt; diff tapes... we had something similar and it 
turned out to be really<BR>&gt; &gt; bad drives (and drive tech, damn 
exabyte)<BR>&gt; &gt;<BR>&gt; &gt; Steve Mickeler wrote:<BR>&gt; &gt; Ive been 
experiencing quite a few error 84 (media write error) lately.<BR>&gt; 
&gt;<BR>&gt; &gt; A job will start running and then I'll notice in the activity 
monitor that<BR>&gt; &gt; the "KB per Second" number for some of the streams 
shows the same number<BR>&gt; &gt; ie: 17804 at which point I know that those 
jobs are going to fail and I'll<BR>&gt; &gt; end up with an error 84 for those 
streams. The job will then start again,<BR>&gt; &gt; sometimes using the same 
media, sometimes a new media, but it will<BR>&gt; &gt; generally succeed
 the second time.<BR>&gt; &gt;<BR>&gt; &gt; Any ideas as to what is causing the 
first job to fail ?<BR>&gt; &gt;<BR>&gt; &gt;<BR>&gt; &gt;<BR>&gt; &gt; from 
the bptm log:<BR>&gt; &gt;<BR>&gt; &gt;<BR>&gt; &gt; 00:03:49.218 [7267] 
&lt;2&gt; io_ioctl: command (0)MTWEOF 1 from (bptm.c.15845)<BR>&gt; &gt; on 
drive index 2<BR>&gt; &gt;<BR>&gt; &gt; 00:21:49.411 [7267] &lt;16&gt; 
io_ioctl: ioctl (MTWEOF) failed on media id<BR>&gt; &gt; 000041, drive index 2, 
I/O error (bptm.c.15845)<BR>&gt; &gt;<BR>&gt; &gt; 00:21:49.412 [7267] 
&lt;2&gt; log_media_error: successfully wrote to error file<BR>&gt; &gt; - 
05/19/04 00:21:49 000041 2 WRITE_ERROR<BR>&gt; &gt;<BR>&gt; &gt; 00:21:49.412 
[7267] &lt;2&gt; check_error_history: called from bptm line 15869,<BR>&gt; &gt; 
EXIT_Status = 84<BR>&gt; &gt;<BR>&gt; &gt; 00:21:49.979 [7267] &lt;2&gt; 
check_error_history: drive index = 2, media id =<BR>&gt; &gt; 000041, time = 
05/19/04 00:21:49, both_match = 0, media_match = 1,<BR>&gt; &gt; drive_match =
 0<BR>&gt; &gt;<BR>&gt; &gt; 
---------------------------------------------------------------------------------<BR>&gt;
 &gt;<BR>&gt; &gt; 00:31:45.025 [11412] &lt;2&gt; io_ioctl: command (0)MTWEOF 1 
from (bptm.c.15845)<BR>&gt; &gt; on drive index 0<BR>&gt; &gt;<BR>&gt; &gt; 
00:49:45.221 [11412] &lt;16&gt; io_ioctl: ioctl (MTWEOF) failed on media 
id<BR>&gt; &gt; 000041, drive index 0, I/O error (bptm.c.15845)<BR>&gt; 
&gt;<BR>&gt; &gt; 00:49:45.222 [11412] &lt;2&gt; log_media_error: successfully 
wrote to error file<BR>&gt; &gt; - 05/19/04 00:49:45 000041 0 
WRITE_ERROR<BR>&gt; &gt;<BR>&gt; &gt; 00:49:45.222 [11412] &lt;2&gt; 
check_error_history: called from bptm line 15869,<BR>&gt; &gt; EXIT_Status = 
84<BR>&gt; &gt;<BR>&gt; &gt; 00:49:45.786 [11412] &lt;2&gt; 
check_error_history: drive index = 0, media id =<BR>&gt; &gt; 000041, time = 
05/19/04 00:49:45, both_match = 0, media_match = 2,<BR>&gt; &gt; drive_match = 
0<BR>&gt; &gt;<BR>&gt; &gt; 00:49:45.992 [11412] &lt;8&gt;
 check_error_history: FREEZING media id 000041, it<BR>&gt; &gt; has had at 
least 3 errors in the last 12 hour(s)<BR>&gt; &gt;<BR>&gt; &gt; 
---------------------------------------------------------------------------------<BR>&gt;
 &gt;<BR>&gt; &gt; 00:36:44.063 [15640] &lt;2&gt; io_ioctl: command (2)MTBSF 1 
from (bptm.c.17369)<BR>&gt; &gt; on drive index 2<BR>&gt; &gt;<BR>&gt; &gt; 
00:36:44.071 [15640] &lt;2&gt; io_ioctl: command (0)MTWEOF 1 from 
(bptm.c.17395)<BR>&gt; &gt; on drive index 2<BR>&gt; &gt;<BR>&gt; &gt; 
00:54:44.272 [15640] &lt;16&gt; io_ioctl: ioctl (MTWEOF) failed on media 
id<BR>&gt; &gt; 000079, drive index 2, I/O error (bptm.c.17395)<BR>&gt; 
&gt;<BR>&gt; &gt; 00:54:44.274 [15640] &lt;2&gt; log_media_error: successfully 
wrote to error file<BR>&gt; &gt; - 05/19/04 00:54:44 000079 2 
WRITE_ERROR<BR>&gt; &gt;<BR>&gt; &gt; 
---------------------------------------------------------------------------------<BR>&gt;
 &gt;<BR>&gt; &gt; 01:03:15.049 [19343] &lt;2&gt; io_ioctl:
 command (0)MTWEOF 1 from (bptm.c.15845)<BR>&gt; &gt; on drive index 1<BR>&gt; 
&gt;<BR>&gt; &gt; 01:21:15.243 [19343] &lt;16&gt; io_ioctl: ioctl (MTWEOF) 
failed on media id<BR>&gt; &gt; 000017, drive index 1, I/O error 
(bptm.c.15845)<BR>&gt; &gt;<BR>&gt; &gt; 01:21:15.244 [19343] &lt;2&gt; 
log_media_error: successfully wrote to error file<BR>&gt; &gt; - 05/19/04 
01:21:15 000017 1 WRITE_ERROR<BR>&gt; &gt;<BR>&gt; &gt; 01:21:15.244 [19343] 
&lt;2&gt; check_error_history: called from bptm line 15869,<BR>&gt; &gt; 
EXIT_Status = 84<BR>&gt; &gt;<BR>&gt; &gt; 01:21:15.813 [19343] &lt;2&gt; 
check_error_history: drive index = 1, media id =<BR>&gt; &gt; 000017, time = 
05/19/04 01:21:15, both_match = 0, media_match = 0,<BR>&gt; &gt; drive_match = 
2<BR>&gt; &gt;<BR>&gt; &gt; 01:22:20.672 [19343] &lt;8&gt; check_error_history: 
DOWN'ing drive index 1, it<BR>&gt; &gt; has had at least 3 errors in last 12 
hour(s)<BR>&gt; &gt;<BR>&gt; &gt;<BR>&gt; &gt;
 
---------------------------------------------------------------------------------<BR>&gt;
 &gt;<BR>&gt; &gt; 06:31:57.233 [13507] &lt;2&gt; io_ioctl: command (0)MTWEOF 1 
from (bptm.c.15845)<BR>&gt; &gt; on drive index 0<BR>&gt; &gt;<BR>&gt; &gt; 
06:49:57.432 [13507] &lt;16&gt; io_ioctl: ioctl (MTWEOF) failed on media 
id<BR>&gt; &gt; 000033, drive index 0, I/O error (bptm.c.15845)<BR>&gt; 
&gt;<BR>&gt; &gt; 06:49:57.433 [13507] &lt;2&gt; log_media_error: successfully 
wrote to error file<BR>&gt; &gt; - 05/19/04 06:49:57 000033 0 
WRITE_ERROR<BR>&gt; &gt;<BR>&gt; &gt; 06:49:57.433 [13507] &lt;2&gt; 
check_error_history: called from bptm line 15869,<BR>&gt; &gt; EXIT_Status = 
84<BR>&gt; &gt;<BR>&gt; &gt; 06:49:58.006 [13507] &lt;2&gt; 
check_error_history: drive index = 0, media id =<BR>&gt; &gt; 000033, time = 
05/19/04 06:49:57, both_match = 0, media_match = 0,<BR>&gt; &gt; drive_match = 
1<BR>&gt; &gt;<BR>&gt; &gt;<BR>&gt; &gt; 
_______________________________________________<BR>&gt; &gt;
 Veritas-bu maillist - Veritas-bu AT mailman.eng.auburn DOT edu<BR>&gt; &gt; 
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu<BR>&gt; &gt;<BR>&gt; 
&gt;<BR>&gt; &gt; aaarrrggghhh!!!!<BR>&gt; &gt; FreeBSD rocks<BR>&gt; 
&gt;<BR>&gt; &gt; ---------------------------------<BR>&gt; &gt; Do you 
Yahoo!?<BR>&gt; &gt; SBC Yahoo! - Internet access at a great low 
price.<BR>&gt;<BR>&gt; aaarrrggghhh!!!!<BR>&gt; FreeBSD rocks<BR>&gt;<BR>&gt; 
---------------------------------<BR>&gt; Do you Yahoo!?<BR>&gt; SBC Yahoo! - 
Internet access at a great low 
price.<BR>_______________________________________________<BR>Veritas-bu 
maillist - Veritas-bu AT mailman.eng.auburn DOT 
edu<BR>http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu</BLOCKQUOTE></DIV><BR><BR>aaarrrggghhh!!!!<br>FreeBSD
 rocks<p>
                <hr size=1><font face=arial size=-1>Do you Yahoo!?<br><a 
href="http://pa.yahoo.com/*http://us.rd.yahoo.com/evt=24311/*http://promo.yahoo.com/sbc/";>SBC
 Yahoo!</a> - Internet access at a great low price.
--0-308640461-1085000693=:47647--