Veritas-bu

[Veritas-bu] error 84 - ioctl (MTWEOF) failed on media id

2004-05-19 13:45:52
Subject: [Veritas-bu] error 84 - ioctl (MTWEOF) failed on media id
From: steve AT warning DOT ca (Steve Mickeler)
Date: Wed, 19 May 2004 13:45:52 -0400 (EDT)
firmware is up to date on the switches. drivers are up to date for solaris
and the JNI HBA's

media server up time is 4 days

The only thing out of date is NBU which is at 4.5 FP5. I suppose I could
try going to FP6 but this setup used to be very trouble-free from the time
we installed it in October 2003.

/var/adm/messages shows

May 19 01:21:15 harbor scsi: [ID 107833 kern.warning] WARNING:
/pci@1f,4000/JNI,FCR@2,1/st@b,0 (st31):
May 19 01:21:15 harbor  SCSI transport failed: reason 'timeout': giving up

May 19 06:49:57 harbor scsi: [ID 107833 kern.warning] WARNING:
/pci@1f,4000/JNI,FCR@2,1/st@a,0 (st32):
May 19 06:49:57 harbor  SCSI transport failed: reason 'timeout': giving up

May 19 10:54:39 harbor scsi: [ID 107833 kern.warning] WARNING:
/pci@1f,4000/JNI,FCR@4,1/st@c,0 (st30):
May 19 10:54:39 harbor  SCSI transport failed: reason 'timeout': giving up

May 19 11:31:47 harbor scsi: [ID 107833 kern.warning] WARNING:
/pci@1f,4000/JNI,FCR@2,1/st@b,0 (st31):
May 19 11:31:47 harbor  SCSI transport failed: reason 'timeout': giving up


On Wed, 19 May 2004, K Chapman wrote:

> is your fibre switch showing many errors on your hba and tape drive
> ports?  drivers/firmware up to date for the hba's, switches, etc?.
> whats the up time on this media server?  seems like the drives are fine
> as you have a grand total of 4 hard/soft errors across youre drives..
> all four errors are soft errors which are 'recoverable'.
>
>
> our 84's were due to hard tape errors...  syslog should show some type
> of error related to the transport error... you probably can check with
> hba and tape vendors about the error codes returned
>
> Steve Mickeler <steve AT warning DOT ca> wrote:
>
> netstat -k shows:
>
> st32,err:
> Soft Errors 0 Hard Errors 0 Transport Errors 5 Vendor IBM
> Product ULTRIUM-TD2 Revision Revision 38D0 Serial No
>
> st31,err:
> Soft Errors 2 Hard Errors 0 Transport Errors 25 Vendor IBM
> Product ULTRIUM-TD2 Revision Revision 38D0 Serial No
>
> st30,err:
> Soft Errors 2 Hard Errors 0 Transport Errors 21 Vendor IBM
> Product ULTRIUM-TD2 Revision Revision 38D0 Serial No
>
> st33,err:
> Soft Errors 0 Hard Errors 0 Transport Errors 10 Vendor IBM
> Product ULTRIUM-TD2 Revision Revision 38D0 Serial No
>
> # modinfo | grep tape
> 51 78234000 12d04 33 1 st (SCSI tape Driver 1.216)
>
> I'm also seeing Transport Errors on all the drives on all the SSO boxes.
>
>
> On Wed, 19 May 2004, K Chapman wrote:
>
> > its trying to write the end of file mark and fails... if this is
> > solaris, you can do netstat -k and look for st,
> > it will show you error counts on the drive... you can also look in the
> > syslog, it should show the driver returning the write error along with
> > the other info. your getting errors across all your drives and with
> > diff tapes... we had something similar and it turned out to be really
> > bad drives (and drive tech, damn exabyte)
> >
> > Steve Mickeler wrote:
> > Ive been experiencing quite a few error 84 (media write error) lately.
> >
> > A job will start running and then I'll notice in the activity monitor that
> > the "KB per Second" number for some of the streams shows the same number
> > ie: 17804 at which point I know that those jobs are going to fail and I'll
> > end up with an error 84 for those streams. The job will then start again,
> > sometimes using the same media, sometimes a new media, but it will
> > generally succeed the second time.
> >
> > Any ideas as to what is causing the first job to fail ?
> >
> >
> >
> > from the bptm log:
> >
> >
> > 00:03:49.218 [7267] <2> io_ioctl: command (0)MTWEOF 1 from (bptm.c.15845)
> > on drive index 2
> >
> > 00:21:49.411 [7267] <16> io_ioctl: ioctl (MTWEOF) failed on media id
> > 000041, drive index 2, I/O error (bptm.c.15845)
> >
> > 00:21:49.412 [7267] <2> log_media_error: successfully wrote to error file
> > - 05/19/04 00:21:49 000041 2 WRITE_ERROR
> >
> > 00:21:49.412 [7267] <2> check_error_history: called from bptm line 15869,
> > EXIT_Status = 84
> >
> > 00:21:49.979 [7267] <2> check_error_history: drive index = 2, media id =
> > 000041, time = 05/19/04 00:21:49, both_match = 0, media_match = 1,
> > drive_match = 0
> >
> > ---------------------------------------------------------------------------------
> >
> > 00:31:45.025 [11412] <2> io_ioctl: command (0)MTWEOF 1 from (bptm.c.15845)
> > on drive index 0
> >
> > 00:49:45.221 [11412] <16> io_ioctl: ioctl (MTWEOF) failed on media id
> > 000041, drive index 0, I/O error (bptm.c.15845)
> >
> > 00:49:45.222 [11412] <2> log_media_error: successfully wrote to error file
> > - 05/19/04 00:49:45 000041 0 WRITE_ERROR
> >
> > 00:49:45.222 [11412] <2> check_error_history: called from bptm line 15869,
> > EXIT_Status = 84
> >
> > 00:49:45.786 [11412] <2> check_error_history: drive index = 0, media id =
> > 000041, time = 05/19/04 00:49:45, both_match = 0, media_match = 2,
> > drive_match = 0
> >
> > 00:49:45.992 [11412] <8> check_error_history: FREEZING media id 000041, it
> > has had at least 3 errors in the last 12 hour(s)
> >
> > ---------------------------------------------------------------------------------
> >
> > 00:36:44.063 [15640] <2> io_ioctl: command (2)MTBSF 1 from (bptm.c.17369)
> > on drive index 2
> >
> > 00:36:44.071 [15640] <2> io_ioctl: command (0)MTWEOF 1 from (bptm.c.17395)
> > on drive index 2
> >
> > 00:54:44.272 [15640] <16> io_ioctl: ioctl (MTWEOF) failed on media id
> > 000079, drive index 2, I/O error (bptm.c.17395)
> >
> > 00:54:44.274 [15640] <2> log_media_error: successfully wrote to error file
> > - 05/19/04 00:54:44 000079 2 WRITE_ERROR
> >
> > ---------------------------------------------------------------------------------
> >
> > 01:03:15.049 [19343] <2> io_ioctl: command (0)MTWEOF 1 from (bptm.c.15845)
> > on drive index 1
> >
> > 01:21:15.243 [19343] <16> io_ioctl: ioctl (MTWEOF) failed on media id
> > 000017, drive index 1, I/O error (bptm.c.15845)
> >
> > 01:21:15.244 [19343] <2> log_media_error: successfully wrote to error file
> > - 05/19/04 01:21:15 000017 1 WRITE_ERROR
> >
> > 01:21:15.244 [19343] <2> check_error_history: called from bptm line 15869,
> > EXIT_Status = 84
> >
> > 01:21:15.813 [19343] <2> check_error_history: drive index = 1, media id =
> > 000017, time = 05/19/04 01:21:15, both_match = 0, media_match = 0,
> > drive_match = 2
> >
> > 01:22:20.672 [19343] <8> check_error_history: DOWN'ing drive index 1, it
> > has had at least 3 errors in last 12 hour(s)
> >
> >
> > ---------------------------------------------------------------------------------
> >
> > 06:31:57.233 [13507] <2> io_ioctl: command (0)MTWEOF 1 from (bptm.c.15845)
> > on drive index 0
> >
> > 06:49:57.432 [13507] <16> io_ioctl: ioctl (MTWEOF) failed on media id
> > 000033, drive index 0, I/O error (bptm.c.15845)
> >
> > 06:49:57.433 [13507] <2> log_media_error: successfully wrote to error file
> > - 05/19/04 06:49:57 000033 0 WRITE_ERROR
> >
> > 06:49:57.433 [13507] <2> check_error_history: called from bptm line 15869,
> > EXIT_Status = 84
> >
> > 06:49:58.006 [13507] <2> check_error_history: drive index = 0, media id =
> > 000033, time = 05/19/04 06:49:57, both_match = 0, media_match = 0,
> > drive_match = 1
> >
> >
> > _______________________________________________
> > Veritas-bu maillist - Veritas-bu AT mailman.eng.auburn DOT edu
> > http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
> >
> >
> > aaarrrggghhh!!!!
> > FreeBSD rocks
> >
> > ---------------------------------
> > Do you Yahoo!?
> > SBC Yahoo! - Internet access at a great low price.
>
> aaarrrggghhh!!!!
> FreeBSD rocks
>
> ---------------------------------
> Do you Yahoo!?
> SBC Yahoo! - Internet access at a great low price.