Veritas-bu

[Veritas-bu] error 84 - ioctl (MTWEOF) failed on media id

2004-05-20 14:22:11
Subject: [Veritas-bu] error 84 - ioctl (MTWEOF) failed on media id
From: steve AT warning DOT ca (Steve Mickeler)
Date: Thu, 20 May 2004 14:22:11 -0400 (EDT)
Turns out that another windows master server who can see these drives via
the SAN had them setup as standalone drives, and even though they werent
being used, it turns out the AVR daemon.

A paste from the windows event log:

Event Type:Information Event Source:NetBackup AVR Daemon Event
Category:None Event ID:4101 Date:5/19/2004 Time:9:08:34 PM User:N/A
Computer:GMTOMS01 Description: Open error on IBMULTRIUM-TD20 (device 1,
\\.\Tape0): The requested resource is in use.  Drive may be busy

Ive changed the fabric zone so they dont see these drives.

Thanks to everyone for the troubleshooting suggestions.


On Wed, 19 May 2004, K Chapman wrote:

> you dont see any errors on the switch ports involved?  anything new get
> added to the switch or any rezoning done to cause things to go bad?  no
> major changes you can think of that could have impacted your sso env?
>
> Steve Mickeler <steve AT warning DOT ca> wrote:
> firmware is up to date on the switches. drivers are up to date for solaris
> and the JNI HBA's
>
> media server up time is 4 days
>
> The only thing out of date is NBU which is at 4.5 FP5. I suppose I could
> try going to FP6 but this setup used to be very trouble-free from the time
> we installed it in October 2003.
>
> /var/adm/messages shows
>
> May 19 01:21:15 harbor scsi: [ID 107833 kern.warning] WARNING:
> /pci@1f,4000/JNI,FCR@2,1/st@b,0 (st31):
> May 19 01:21:15 harbor SCSI transport failed: reason 'timeout': giving up
>
> May 19 06:49:57 harbor scsi: [ID 107833 kern.warning] WARNING:
> /pci@1f,4000/JNI,FCR@2,1/st@a,0 (st32):
> May 19 06:49:57 harbor SCSI transport failed: reason 'timeout': giving up
>
> May 19 10:54:39 harbor scsi: [ID 107833 kern.warning] WARNING:
> /pci@1f,4000/JNI,FCR@4,1/st@c,0 (st30):
> May 19 10:54:39 harbor SCSI transport failed: reason 'timeout': giving up
>
> May 19 11:31:47 harbor scsi: [ID 107833 kern.warning] WARNING:
> /pci@1f,4000/JNI,FCR@2,1/st@b,0 (st31):
> May 19 11:31:47 harbor SCSI transport failed: reason 'timeout': giving up
>
>
> On Wed, 19 May 2004, K Chapman wrote:
>
> > is your fibre switch showing many errors on your hba and tape drive
> > ports? drivers/firmware up to date for the hba's, switches, etc?.
> > whats the up time on this media server? seems like the drives are fine
> > as you have a grand total of 4 hard/soft errors across youre drives..
> > all four errors are soft errors which are 'recoverable'.
> >
> >
> > our 84's were due to hard tape errors... syslog should show some type
> > of error related to the transport error... you probably can check with
> > hba and tape vendors about the error codes returned
> >
> > Steve Mickeler wrote:
> >
> > netstat -k shows:
> >
> > st32,err:
> > Soft Errors 0 Hard Errors 0 Transport Errors 5 Vendor IBM
> > Product ULTRIUM-TD2 Revision Revision 38D0 Serial No
> >
> > st31,err:
> > Soft Errors 2 Hard Errors 0 Transport Errors 25 Vendor IBM
> > Product ULTRIUM-TD2 Revision Revision 38D0 Serial No
> >
> > st30,err:
> > Soft Errors 2 Hard Errors 0 Transport Errors 21 Vendor IBM
> > Product ULTRIUM-TD2 Revision Revision 38D0 Serial No
> >
> > st33,err:
> > Soft Errors 0 Hard Errors 0 Transport Errors 10 Vendor IBM
> > Product ULTRIUM-TD2 Revision Revision 38D0 Serial No
> >
> > # modinfo | grep tape
> > 51 78234000 12d04 33 1 st (SCSI tape Driver 1.216)
> >
> > I'm also seeing Transport Errors on all the drives on all the SSO boxes.
> >
> >
> > On Wed, 19 May 2004, K Chapman wrote:
> >
> > > its trying to write the end of file mark and fails... if this is
> > > solaris, you can do netstat -k and look for st,
> > > it will show you error counts on the drive... you can also look in the
> > > syslog, it should show the driver returning the write error along with
> > > the other info. your getting errors across all your drives and with
> > > diff tapes... we had something similar and it turned out to be really
> > > bad drives (and drive tech, damn exabyte)
> > >
> > > Steve Mickeler wrote:
> > > Ive been experiencing quite a few error 84 (media write error) lately.
> > >
> > > A job will start running and then I'll notice in the activity monitor that
> > > the "KB per Second" number for some of the streams shows the same number
> > > ie: 17804 at which point I know that those jobs are going to fail and I'll
> > > end up with an error 84 for those streams. The job will then start again,
> > > sometimes using the same media, sometimes a new media, but it will
> > > generally succeed the second time.
> > >
> > > Any ideas as to what is causing the first job to fail ?
> > >
> > >
> > >
> > > from the bptm log:
> > >
> > >
> > > 00:03:49.218 [7267] <2> io_ioctl: command (0)MTWEOF 1 from (bptm.c.15845)
> > > on drive index 2
> > >
> > > 00:21:49.411 [7267] <16> io_ioctl: ioctl (MTWEOF) failed on media id
> > > 000041, drive index 2, I/O error (bptm.c.15845)
> > >
> > > 00:21:49.412 [7267] <2> log_media_error: successfully wrote to error file
> > > - 05/19/04 00:21:49 000041 2 WRITE_ERROR
> > >
> > > 00:21:49.412 [7267] <2> check_error_history: called from bptm line 15869,
> > > EXIT_Status = 84
> > >
> > > 00:21:49.979 [7267] <2> check_error_history: drive index = 2, media id =
> > > 000041, time = 05/19/04 00:21:49, both_match = 0, media_match = 1,
> > > drive_match = 0
> > >
> > > ---------------------------------------------------------------------------------
> > >
> > > 00:31:45.025 [11412] <2> io_ioctl: command (0)MTWEOF 1 from (bptm.c.15845)
> > > on drive index 0
> > >
> > > 00:49:45.221 [11412] <16> io_ioctl: ioctl (MTWEOF) failed on media id
> > > 000041, drive index 0, I/O error (bptm.c.15845)
> > >
> > > 00:49:45.222 [11412] <2> log_media_error: successfully wrote to error file
> > > - 05/19/04 00:49:45 000041 0 WRITE_ERROR
> > >
> > > 00:49:45.222 [11412] <2> check_error_history: called from bptm line 15869,
> > > EXIT_Status = 84
> > >
> > > 00:49:45.786 [11412] <2> check_error_history: drive index = 0, media id =
> > > 000041, time = 05/19/04 00:49:45, both_match = 0, media_match = 2,
> > > drive_match = 0
> > >
> > > 00:49:45.992 [11412] <8> check_error_history: FREEZING media id 000041, it
> > > has had at least 3 errors in the last 12 hour(s)
> > >
> > > ---------------------------------------------------------------------------------
> > >
> > > 00:36:44.063 [15640] <2> io_ioctl: command (2)MTBSF 1 from (bptm.c.17369)
> > > on drive index 2
> > >
> > > 00:36:44.071 [15640] <2> io_ioctl: command (0)MTWEOF 1 from (bptm.c.17395)
> > > on drive index 2
> > >
> > > 00:54:44.272 [15640] <16> io_ioctl: ioctl (MTWEOF) failed on media id
> > > 000079, drive index 2, I/O error (bptm.c.17395)
> > >
> > > 00:54:44.274 [15640] <2> log_media_error: successfully wrote to error file
> > > - 05/19/04 00:54:44 000079 2 WRITE_ERROR
> > >
> > > ---------------------------------------------------------------------------------
> > >
> > > 01:03:15.049 [19343] <2> io_ioctl: command (0)MTWEOF 1 from (bptm.c.15845)
> > > on drive index 1
> > >
> > > 01:21:15.243 [19343] <16> io_ioctl: ioctl (MTWEOF) failed on media id
> > > 000017, drive index 1, I/O error (bptm.c.15845)
> > >
> > > 01:21:15.244 [19343] <2> log_media_error: successfully wrote to error file
> > > - 05/19/04 01:21:15 000017 1 WRITE_ERROR
> > >
> > > 01:21:15.244 [19343] <2> check_error_history: called from bptm line 15869,
> > > EXIT_Status = 84
> > >
> > > 01:21:15.813 [19343] <2> check_error_history: drive index = 1, media id =
> > > 000017, time = 05/19/04 01:21:15, both_match = 0, media_match = 0,
> > > drive_match = 2
> > >
> > > 01:22:20.672 [19343] <8> check_error_history: DOWN'ing drive index 1, it
> > > has had at least 3 errors in last 12 hour(s)
> > >
> > >
> > > ---------------------------------------------------------------------------------
> > >
> > > 06:31:57.233 [13507] <2> io_ioctl: command (0)MTWEOF 1 from (bptm.c.15845)
> > > on drive index 0
> > >
> > > 06:49:57.432 [13507] <16> io_ioctl: ioctl (MTWEOF) failed on media id
> > > 000033, drive index 0, I/O error (bptm.c.15845)
> > >
> > > 06:49:57.433 [13507] <2> log_media_error: successfully wrote to error file
> > > - 05/19/04 06:49:57 000033 0 WRITE_ERROR
> > >
> > > 06:49:57.433 [13507] <2> check_error_history: called from bptm line 15869,
> > > EXIT_Status = 84
> > >
> > > 06:49:58.006 [13507] <2> check_error_history: drive index = 0, media id =
> > > 000033, time = 05/19/04 06:49:57, both_match = 0, media_match = 0,
> > > drive_match = 1
> > >
> > >
> > > _______________________________________________
> > > Veritas-bu maillist - Veritas-bu AT mailman.eng.auburn DOT edu
> > > http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
> > >
> > >
> > > aaarrrggghhh!!!!
> > > FreeBSD rocks
> > >
> > > ---------------------------------
> > > Do you Yahoo!?
> > > SBC Yahoo! - Internet access at a great low price.
> >
> > aaarrrggghhh!!!!
> > FreeBSD rocks
> >
> > ---------------------------------
> > Do you Yahoo!?
> > SBC Yahoo! - Internet access at a great low price.
> _______________________________________________
> Veritas-bu maillist - Veritas-bu AT mailman.eng.auburn DOT edu
> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
>
>
> aaarrrggghhh!!!!
> FreeBSD rocks
>
> ---------------------------------
> Do you Yahoo!?
> SBC Yahoo! - Internet access at a great low price.