Networker

Re: [Networker] tape errors - save set terminations

2002-11-22 09:21:29
Subject: Re: [Networker] tape errors - save set terminations
From: Robert Maiello <robert.maiello AT MEDEC DOT COM>
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Date: Fri, 22 Nov 2002 09:21:58 -0500
On Thu, 21 Nov 2002 20:48:36 -0600, Ty Young <Phillip_Young AT I2 DOT COM> 
wrote:

>This is a known issue (LGTpa43159) and I've had a support case (3015656)
open on this exact issue for
>approximately 7 months (no exaggeration.)   Though they're still trying to
>confirm this, Legato support *thinks* that the problem occurs when we hit
>the EOM and attempt to buffer the currently-being-written data while
>backing up 2 file markers so that we can then write the tape as full. What
>they think is happening is nsrmmd is not sensing the EOM properly and that
>we continue to write past the physical end of tape -- thereby creating an
>unrecoverable I/O error condition.
>
>Though I'm on the whole very pleased with NetWorker 6.1.1 (see my Case
>Study on the Legato website) I only started to get this problem after
>moving off 6.0.1.
>
>-ty
>
>Phillip T. ("Ty") Young, DMA
>Backup/Recovery Systems Mgr.
>Network Services Group
>i2 Technologies, Inc.
>
>
>
>
>Dona Ashcroft <Dona.Ashcroft AT ENBRIDGE DOT COM>
>Sent by: Legato NetWorker discussion <NETWORKER AT LISTMAIL.TEMPLE DOT EDU>
>11/21/2002 04:17 PM
>Please respond to Legato NetWorker discussion; Please respond to Dona
>Ashcroft
>
>
>        To:     NETWORKER AT LISTMAIL.TEMPLE DOT EDU
>        cc:
>        Subject:        [Networker] tape errors - save set terminations
>
>
>Environment:
>NetWorker Server - Sun V880 Solaris 8
>NetWorker Server version 6.0.2
>STK9710 tape silo - 8 DLT 7000 tape drives
>clients:  HP-UX, solaris 8, NetWare, W2K
>
>We are getting the following messages:
>
>Nov 20 16:22:08 mastertr0 root: [ID 702911 daemon.notice] NetWorker media:
>(notice) Save set (3690336257) marset:/prod/OP01/ora41 volume 001086 on
>/dev/rmt/4hbn is being terminated because: Media verification failed
>Nov 20 16:22:08 mastertr0 root: [ID 702911 daemon.notice] NetWorker media:
>(notice) Save set (3681801729) oregano:/devl/OAD1/dump volume 001086 on
>/dev/rmt/4hbn is being terminated because: Media verification failed
>Nov 20 16:22:08 mastertr0 root: [ID 702911 daemon.notice] NetWorker media:
>(notice) Save set (3680443905) oregano:/devl/OAD6/dump volume 001086 on
>/dev/rmt/4hbn is being terminated because: Media verification failed
>Nov 21 04:15:10 mastertr0 root: [ID 702911 daemon.notice] NetWorker media:
>(notice) Save set (3696207617) hdq-nt73:E:\ volume D00443 on /dev/rmt/6hbn
>is being terminated because: Media verification failed
>Nov 21 04:15:10 mastertr0 root: [ID 702911 daemon.notice] NetWorker media:
>(notice) Save set (3695733249) cwl-nt03et:E:\ volume D00443 on
>/dev/rmt/6hbn is being terminated because: Media verification failed
>Nov 21 04:15:10 mastertr0 root: [ID 702911 daemon.notice] NetWorker media:
>(notice) Save set (3696199682) hdq-nt76:E:\ volume D00443 on /dev/rmt/6hbn
>is being terminated because: Media verification failed
>
>Nov 21 04:15:09 mastertr0 scsi: [ID 107833 kern.warning] WARNING:
>/pci@8,700000/scsi@3/st@4,0 (st25):
>Nov 21 04:15:09 mastertr0       Error for Command: read Error Level: Fatal
>Nov 21 04:15:09 mastertr0 scsi: [ID 107833 kern.notice]         Requested
>Block: 65                        Error
> Block: 65
>Nov 21 04:15:09 mastertr0 scsi: [ID 107833 kern.notice]         Vendor:
>QUANTUM                            Seria
>l Number: qj  6 i  O
>Nov 21 04:15:09 mastertr0 scsi: [ID 107833 kern.notice]         Sense Key:
>Media Error
>Nov 21 04:15:09 mastertr0 scsi: [ID 107833 kern.notice]         ASC: 0x14
>(recorded entity not found), ASCQ: 0x0
>, FRU: 0x0
>
>It would seem that we have bad media, but these errors are occuring
>frequently on a large number and range of tapes.  Initially, it was
>happening on only one tape drive.  That tape drive has been disabled.  For
>a few days, things were okay, but we've started getting the messages (and
>save set failures) again.
>
>Could this be related to the fact that we have tape drives daisy-chained?
>6 of the tape drives are daisy chained into sets of 2 each.  One of the
>tape drives is on its own, and the other tape drive is sharing a scsi card
>with the tape silo.
>
>These errors began after we replaced our NetWorker server with the V880
>and upgraded the OS to Solaris 8.  This also involved a reconfiguration of
>the jukebox.
>
>The problems are occuring on random tapes.  Some of the tapes are
>relatively new, some we've been using for a few months (without problem),
>and some have been around for several months.
>
>Comments please?
>
>
>--
>Note: To sign off this list, send a "signoff" command via email
>to listserv AT listmail.temple DOT edu or visit the list's Web site at
>http://listmail.temple.edu/archives/networker.html where you can
>also view and post messages to the list.
>=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
>Thanks, Dona
>
>
>
>--
>Note: To sign off this list, send a "signoff" command via email
>to listserv AT listmail.temple DOT edu or visit the list's Web site at
>http://listmail.temple.edu/archives/networker.html where you can
>also view and post messages to the list.
>=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

--
Note: To sign off this list, send a "signoff" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=