Veritas-bu

[Veritas-bu] FW: media write errors - 84

2001-01-11 05:59:24
Subject: [Veritas-bu] FW: media write errors - 84
From: Thorpe, Alastair at78748 AT GlaxoWellcome.co DOT uk
Date: Thu, 11 Jan 2001 10:59:24 -0000
        We have four similar installations (although each is independent - a
unique NB Cluster) using the Sun L11000 (Aka ATL P3000), although we are
still using NetBackup 3.1.1 with Solaris 2.6 on E3500's.  All four
installations are identically configured, even down to the firmware
revisions on the devices.   For only one of these installations (physically
co-located with one of the others), we have been plagued by 'media write
errors (84)' and are still conducting our own root cause analysis. We
followed the same path of swapping out drives that were giving the most of
these errors but the problem was not eradicated. 

        For what it is worth, we involved our media supplier, who brought in
media specialists from Imation (who currently do not sell DLT Media so are
reasonably impartial).  We were told that: "it is possible for a dodgy DLT
drive (perhaps heads are misaligned) to be capable of writing data to a DLT
tape 'successfully', then if this tape is loaded into another perfectly good
DLT drive, then this can have problems reading the tape, and perhaps the
cleaning LED comes on.  This can lead you to see errors in other perfectly
good drives when this 'poorly written' media is loaded and make you believe
there is a problem with them, when in fact it was a different drive that
originally wrote the data".

        Once pragmatic step we have taken on the media suppliers advice,
which are resulted in a significant reduction (6x reduction) was to switch
DLT tape brand from a Fuji to Maxell manufactured media.  Some brands (e.g.
Quantum) co-source so our supplier checks the batch numbers. Apparently,
there are currently only two manufacturers world-wide, licensed by Quantum
to make DLT media (FUJI and Maxell).  Both manufacture to strict standard
defined by Quantum and employ rigorous Quality Control.  If there had been
problems with a certain batch then this would be known to them.  So there is
no suggestion that the original FUJI media we were using in this one
installation is anything but perfect.  In fact, we have re-deployed
elsewhere and it works fine. I do not know how to explain the improvement,
but just know for this particular installation (a complex interaction of
hardware and software components, plus environmental conditions) it has made
a positive improvement.  However, we still see 1-2 Media write errors/week,
on a hard-worked L11000, and would ideally like to eradicate them altogether
if this was possible.

        So this discussion thread is very interesting to us also.  Further
actions we are planning are to upgrade NB to 3.2 or 3.4 and are also looking
at upgrading the firmware in the L11000.   

        Regards
>       Alastair 
>       GlaxoSmithKline plc
> 
> -----Original Message-----
> From: Collins, Kathy [SMTP:KCollins AT coral-energy DOT com]
> Sent: Wednesday, January 10, 2001 4:02 PM
> To:   'veritas-bu AT mailman.eng.auburn DOT edu'
> Subject:      [Veritas-bu] FW: media write errors - 84
> 
> This is a status update to a message I posted back in November.  I have
> been
> attempting to resolve our media write errors in NetBackup.  Our problem of
> getting
> these errors once or twice a night eventually only occurred on one drive,
> which
> we had already replaced once.  We had the drive replaced a second time,
> and
> still saw the errors just on that drive.  Then we switched this drive with
> another
> to see if the problem followed the drive or stayed with the location.  It
> followed
> the drive.  We had the drive replaced a third time about a week ago and
> haven't
> seen the write errors on the drive since.  
> 
> The next day we got two write errors on two other drives, drives that have
> never 
> had this error before.  We have also seen a few ioctl (MTWEOF) and
> (MTWFSF) 
> errors on the drive that we replaced.  These errors freeze the tape
> immediately.  
> I'm not convinced that these tapes are bad.
> 
> I had a few replies from people on the list with similar problems, both
> stating that
> the problem has never gone away, no matter what they tried.  Both replaced
> drives
> several times.   
> 
> We have the same version of NetBackup installed on an Ultra 2 connected to
> an
> L3500 with none of the above problems.
> 
> Does anyone have any other suggestions on what I can try to stop these
> errors
> from occurring?  Are there lots of you having the same problem?  Or is it
> just the
> three of us?
> 
> Thanks,
> Kathy
> 
> >  -----Original Message-----
> > From:       Collins, Kathy  
> > Sent:       Monday, November 20, 2000 4:05 PM
> > To: 'veritas-bu AT mailman.eng.auburn DOT edu'
> > Subject:    media write errors - 84
> > 
> > Hi,
> > 
> > I'm using NetBackup 3.2 with Solaris 2.6 on an E450 connected to an
> > L11000.
> > I upgraded from NetBackup 3.2 patch 328 to patch 363 on November 3rd.
> > About a week later, 
> > our media errors came along much more frequently.  Although the errors
> > appear the same as
> > previous media error in the messages log, the wording is different in
> the
> > Problems report of
> > NetBackup.   Here is a log I've been keeping with the stats on the
> errors.
> > Sometimes it looks
> > like a drive problem, sometimes like a tape problem.  These tapes all
> have
> > only 15 to 20 mounts.
> > The actual error reads "cannot write image to media id DOA763, drive
> index
> > 2, I/O error", whereas
> > previous media errors read "write error on media id...".  Both show up
> > with a status code of 84.
> > 
> > 11/11 20:33 DOA763  drive index 5  cannot write image to media id...
> > 11/12 09:33 DOA871  drive index 5  cannot write image to media id...
> > 11/13 07:16 DOA885  drive index 5  cannot write image to media id...
> > -
> > replaced drive index 5 (/dev/rmt/5)
> > -
> > 11/13 20:57 DOA656  drive index 5  cannot write image to media id...
> > 11/14 07:19 DOA885  drive index 5  cannot write image to media id...
> > 11/14 20:35 DOA860  drive index 2  cannot write image to media id...
> > 11/16 06:47 DOA651  drive index 2  cannot write image to media id...
> > 11/16 07:37 DOA651  drive index 5  cannot write image to media id...
> > 11/16 18:35 DOA651  drive index 5  cannot write image to media id...
> > 11/16 23:22 DOA878  drive index 2  cannot write image to media id...
> > 11/17 18:37 DOA860  drive index 2  cannot write image to media id...
> > 11/19 19:45 DOA773  drive index 2  cannot write image to media id...
> > 11/20 02:34 DOA773  drive index 2  cannot write image to media id...
> > 11/20 06:58 DOA763  drive index 2  cannot write image to media id...
> > 
> > I'm having drive index 2 replaced tomorrow, although it didn't stop
> > the errors in drive index 5.  Does anyone know of this problem and if
> > it could be related to patch 363?  Any recommendations on what patch
> > I should jump to assuming that it may be related to the patch?  If I
> > go directly to 3.4, will I be able to restore my data that was backed
> > up with 3.2?
> > 
> > Thanks for any feedback.
> > Regards,
> > Kathy
> > 
> > 
> > Kathy Collins
> > Coral Energy, L.L.P
> > Phone: 713.230.3426
> > kcollins AT coral-energy DOT com
> > 



<Prev in Thread] Current Thread [Next in Thread>