Veritas-bu

[Veritas-bu] more on drives being downed

2002-05-22 12:49:17
Subject: [Veritas-bu] more on drives being downed
From: Mark.Donaldson AT experianems DOT com (Donaldson, Mark)
Date: Wed, 22 May 2002 10:49:17 -0600
This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.

------_=_NextPart_001_01C201B0.9C935B90
Content-Type: text/plain

If you've recently patched/installed/etc. either Solaris or NBU, then I'd
suggest also looking at the /kernel/drv/st.conf file.  It's the conf file
for the scsi tape driver.  Mine was overwritten by a sun patch kit and I got
read/write/position errors galore.  

Make sure you add "VERBOSE" to the bp.conf file on you master/media servers.

-Mark

-----Original Message-----
From: danix AT cloud9 DOT net [mailto:danix AT cloud9 DOT net]
Sent: Wednesday, May 22, 2002 9:56 AM
To: veritas-bu AT mailman.eng.auburn DOT edu
Subject: [Veritas-bu] more on drives being downed


I'm learning.

I looked in /opt/openv/netbackup/db/media/errors and found a bunch of read
errors.

I parsed the file with grep/cut/sort/uniq and came up with around 22
different
tapes that present read errors, all since May 14th, a few days before we
made our original system changes.

So, it seems that netbackup is doing the right thing and marking the drives
as down, when it is seeing the read errors.

So, now we are:
- increasing the logging levels 
- checking the storagetek array (9710) for hardware problems.
- going to try new tapes

It's hard to believe that 20+ tapes are all bad, and it's also not a
coincidence
that both arrays were having problems.  Could there be something at the Sun 
level causing read errors?  In my experience, read errors are either bad
tapes or bad heads.

To answer a couple of other questions I received, we've run robtest OK, we
don't
have a separate media server, and we're reinventoried the robot (actually 
reinstalled 4.3 completely yesterday).

I'm pointing to hardware problems at this point, how about you?
_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

------_=_NextPart_001_01C201B0.9C935B90
Content-Type: text/html
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3DUS-ASCII">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
5.5.2653.12">
<TITLE>RE: [Veritas-bu] more on drives being downed</TITLE>
</HEAD>
<BODY>

<P><FONT SIZE=3D2>If you've recently patched/installed/etc. either =
Solaris or NBU, then I'd suggest also looking at the =
/kernel/drv/st.conf file.&nbsp; It's the conf file for the scsi tape =
driver.&nbsp; Mine was overwritten by a sun patch kit and I got =
read/write/position errors galore.&nbsp; </FONT></P>

<P><FONT SIZE=3D2>Make sure you add &quot;VERBOSE&quot; to the bp.conf =
file on you master/media servers.</FONT>
</P>

<P><FONT SIZE=3D2>-Mark</FONT>
</P>

<P><FONT SIZE=3D2>-----Original Message-----</FONT>
<BR><FONT SIZE=3D2>From: danix AT cloud9 DOT net [<A =
HREF=3D"mailto:danix AT cloud9 DOT net">mailto:danix AT cloud9 DOT 
net</A>]</FONT>
<BR><FONT SIZE=3D2>Sent: Wednesday, May 22, 2002 9:56 AM</FONT>
<BR><FONT SIZE=3D2>To: veritas-bu AT mailman.eng.auburn DOT edu</FONT>
<BR><FONT SIZE=3D2>Subject: [Veritas-bu] more on drives being =
downed</FONT>
</P>
<BR>

<P><FONT SIZE=3D2>I'm learning.</FONT>
</P>

<P><FONT SIZE=3D2>I looked in /opt/openv/netbackup/db/media/errors and =
found a bunch of read</FONT>
<BR><FONT SIZE=3D2>errors.</FONT>
</P>

<P><FONT SIZE=3D2>I parsed the file with grep/cut/sort/uniq and came up =
with around 22 different</FONT>
<BR><FONT SIZE=3D2>tapes that present read errors, all since May 14th, =
a few days before we</FONT>
<BR><FONT SIZE=3D2>made our original system changes.</FONT>
</P>

<P><FONT SIZE=3D2>So, it seems that netbackup is doing the right thing =
and marking the drives</FONT>
<BR><FONT SIZE=3D2>as down, when it is seeing the read errors.</FONT>
</P>

<P><FONT SIZE=3D2>So, now we are:</FONT>
<BR><FONT SIZE=3D2>- increasing the logging levels </FONT>
<BR><FONT SIZE=3D2>- checking the storagetek array (9710) for hardware =
problems.</FONT>
<BR><FONT SIZE=3D2>- going to try new tapes</FONT>
</P>

<P><FONT SIZE=3D2>It's hard to believe that 20+ tapes are all bad, and =
it's also not a coincidence</FONT>
<BR><FONT SIZE=3D2>that both arrays were having problems.&nbsp; Could =
there be something at the Sun </FONT>
<BR><FONT SIZE=3D2>level causing read errors?&nbsp; In my experience, =
read errors are either bad</FONT>
<BR><FONT SIZE=3D2>tapes or bad heads.</FONT>
</P>

<P><FONT SIZE=3D2>To answer a couple of other questions I received, =
we've run robtest OK, we don't</FONT>
<BR><FONT SIZE=3D2>have a separate media server, and we're =
reinventoried the robot (actually </FONT>
<BR><FONT SIZE=3D2>reinstalled 4.3 completely yesterday).</FONT>
</P>

<P><FONT SIZE=3D2>I'm pointing to hardware problems at this point, how =
about you?</FONT>
<BR><FONT =
SIZE=3D2>_______________________________________________</FONT>
<BR><FONT SIZE=3D2>Veritas-bu maillist&nbsp; -&nbsp; =
Veritas-bu AT mailman.eng.auburn DOT edu</FONT>
<BR><FONT SIZE=3D2><A =
HREF=3D"http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu"; =
TARGET=3D"_blank">http://mailman.eng.auburn.edu/mailman/listinfo/veritas=
-bu</A></FONT>
</P>

</BODY>
</HTML>
------_=_NextPart_001_01C201B0.9C935B90--

<Prev in Thread] Current Thread [Next in Thread>