Re: [Networker] tape errors - save set terminations

This is a known issue (LGTpa43159) and I've had a support case (3015656) open 
on this exact issue for
approximately 7 months (no exaggeration.)   Though they're still trying to
confirm this, Legato support *thinks* that the problem occurs when we hit
the EOM and attempt to buffer the currently-being-written data while
backing up 2 file markers so that we can then write the tape as full. What
they think is happening is nsrmmd is not sensing the EOM properly and that
we continue to write past the physical end of tape -- thereby creating an
unrecoverable I/O error condition.

Though I'm on the whole very pleased with NetWorker 6.1.1 (see my Case
Study on the Legato website) I only started to get this problem after
moving off 6.0.1.

-ty

Phillip T. ("Ty") Young, DMA
Backup/Recovery Systems Mgr.
Network Services Group
i2 Technologies, Inc.




Dona Ashcroft <Dona.Ashcroft AT ENBRIDGE DOT COM>
Sent by: Legato NetWorker discussion <NETWORKER AT LISTMAIL.TEMPLE DOT EDU>
11/21/2002 04:17 PM
Please respond to Legato NetWorker discussion; Please respond to Dona
Ashcroft


        To:     NETWORKER AT LISTMAIL.TEMPLE DOT EDU
        cc:
        Subject:        [Networker] tape errors - save set terminations


Environment:
NetWorker Server - Sun V880 Solaris 8
NetWorker Server version 6.0.2
STK9710 tape silo - 8 DLT 7000 tape drives
clients:  HP-UX, solaris 8, NetWare, W2K

We are getting the following messages:

Nov 20 16:22:08 mastertr0 root: [ID 702911 daemon.notice] NetWorker media:
(notice) Save set (3690336257) marset:/prod/OP01/ora41 volume 001086 on
/dev/rmt/4hbn is being terminated because: Media verification failed
Nov 20 16:22:08 mastertr0 root: [ID 702911 daemon.notice] NetWorker media:
(notice) Save set (3681801729) oregano:/devl/OAD1/dump volume 001086 on
/dev/rmt/4hbn is being terminated because: Media verification failed
Nov 20 16:22:08 mastertr0 root: [ID 702911 daemon.notice] NetWorker media:
(notice) Save set (3680443905) oregano:/devl/OAD6/dump volume 001086 on
/dev/rmt/4hbn is being terminated because: Media verification failed
Nov 21 04:15:10 mastertr0 root: [ID 702911 daemon.notice] NetWorker media:
(notice) Save set (3696207617) hdq-nt73:E:\ volume D00443 on /dev/rmt/6hbn
is being terminated because: Media verification failed
Nov 21 04:15:10 mastertr0 root: [ID 702911 daemon.notice] NetWorker media:
(notice) Save set (3695733249) cwl-nt03et:E:\ volume D00443 on
/dev/rmt/6hbn is being terminated because: Media verification failed
Nov 21 04:15:10 mastertr0 root: [ID 702911 daemon.notice] NetWorker media:
(notice) Save set (3696199682) hdq-nt76:E:\ volume D00443 on /dev/rmt/6hbn
is being terminated because: Media verification failed

Nov 21 04:15:09 mastertr0 scsi: [ID 107833 kern.warning] WARNING:
/pci@8,700000/scsi@3/st@4,0 (st25):
Nov 21 04:15:09 mastertr0       Error for Command: read Error Level: Fatal
Nov 21 04:15:09 mastertr0 scsi: [ID 107833 kern.notice]         Requested
Block: 65                        Error
 Block: 65
Nov 21 04:15:09 mastertr0 scsi: [ID 107833 kern.notice]         Vendor:
QUANTUM                            Seria
l Number: qj  6 i  O
Nov 21 04:15:09 mastertr0 scsi: [ID 107833 kern.notice]         Sense Key:
Media Error
Nov 21 04:15:09 mastertr0 scsi: [ID 107833 kern.notice]         ASC: 0x14
(recorded entity not found), ASCQ: 0x0
, FRU: 0x0

It would seem that we have bad media, but these errors are occuring
frequently on a large number and range of tapes.  Initially, it was
happening on only one tape drive.  That tape drive has been disabled.  For
a few days, things were okay, but we've started getting the messages (and
save set failures) again.

Could this be related to the fact that we have tape drives daisy-chained?
6 of the tape drives are daisy chained into sets of 2 each.  One of the
tape drives is on its own, and the other tape drive is sharing a scsi card
with the tape silo.

These errors began after we replaced our NetWorker server with the V880
and upgraded the OS to Solaris 8.  This also involved a reconfiguration of
the jukebox.

The problems are occuring on random tapes.  Some of the tapes are
relatively new, some we've been using for a few months (without problem),
and some have been around for several months.

Comments please?


--
Note: To sign off this list, send a "signoff" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
Thanks, Dona



--
Note: To sign off this list, send a "signoff" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=