Veritas-bu

[Veritas-bu] HELP - media and I/O errors

2004-02-16 18:11:23
Subject: [Veritas-bu] HELP - media and I/O errors
From: denis AT kapusta DOT com (Denis Petrov)
Date: Mon, 16 Feb 2004 15:11:23 -0800
This is a multi-part message in MIME format.

------=_NextPart_000_0050_01C3F49F.23B815B0
Content-Type: text/plain;
        charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

MessageI had simillar issue on L80. STK tech suggested that issue is may =
be related to the Netbackup setting of "Media unmount delay" which is =
way too short by default 3 minutes. What is happening tape is getting =
unmounted before it completely rewound. I had some arguments with my =
co-workers about it.... Speed of LTO's vs DLT's, since the issues that =
never came up when we used DLTs, but I was able to confirm the issues =
with my L80 logs some are exactly what the tech suggested and some are =
similar. In addition the issues does not seem to show up right away =
until LTO tapes have significant amount of data - takes longer to =
rewind???... . I say if everything failes try to change Media unmount =
delay to something like 15 minutes or so and see if it makes any =
difference

--Denis

  ----- Original Message -----=20
  From: Dave Markham=20
  To: Sokolowski Ric-ERS004 ; veritas-bu AT mailman.eng.auburn DOT edu=20
  Sent: Wednesday, February 11, 2004 9:23 AM
  Subject: Re: [Veritas-bu] HELP - media and I/O errors


  Have the cables connecting the devices been replaced?=20

  I have similar problem recently with LTO drives and tried many things. =
My environment was Solaris and there were some patches to apply ( =
although that doesn't help sorry ), but the cables were mentioned plus I =
found LTO media has a chip inside it which can be dislodged. If you =
shake the tapes and they rattle loudly then most likely they are =
damaged. This could be more than one tape if they have come from the =
same batch perhaps.

  Just some ideas
  Dave
    ----- Original Message -----=20
    From: Sokolowski Ric-ERS004=20
    To: 'veritas-bu AT mailman.eng.auburn DOT edu'=20
    Sent: Wednesday, February 11, 2004 4:28 PM
    Subject: [Veritas-bu] HELP - media and I/O errors


    Our system:

    NB 4.5 MP5
    master - HP-UX 11.00
    media - 4 HP-UX 11.00, 1 HP-UX 11.11
    STK L700 (HP20/700) w/10 HP LTO 1 drives w/SSO
    5 HP 2/1 FC/SCSI bridges
    1 Brocade 2800

    We're seeing tons of media-related errors (70% status 86 - media =
position, 30% status 84 - media write) spread across
    all drives.  Some nights we see no errors, other nights we'll see =
50-100 media-related failures.  We see the failures when
    reusing tapes and with brand new tapes.  All drives have been =
cleaned recently.  We have had cases open w/Veritas and
    HP for just over 4 weeks now.  Veritas has examined over a months =
worth of log files and has determined that the
    problem is hardware related.  HP replaced 3 drives, we saw media =
failures on these 3 new drives the same day they were
    replaced.  HP also replaced the robot controller, the camera, and =
one of the Fibre bridges.  We're not seeing any
    communication errors on the FC switch.  Everything has the latest =
available firmware.  Whenever we get the status 84/86,
    we see a  lot of things like "cannot read from media socket 10", =
"ioctl (MTREW) failed on media id 402280, drive index 4,
    I/O error (bptm.c.7197)" and "write error on media id 402280, drive =
index 4, writing header block, I/O error".  Normally,
    between 2 and 5 drives are downed every night - always with a tape =
stuck in the the drive.  Occasionally the system will
    freeze dozens of tapes because they're seen as "unmountable" which =
leads to a boatload of status 96 (no media)
    failures.  Our backup success rate has dropped from over 98% to =
below 80% - management is freaking out.  We're
    grasping at straws here folks, any help would be GREATLY =
appreciated!

    --=20
    Regards,=20
    Ric Sokolowski (Ric.Sokolowski AT motorola DOT com)=20
    Staff Systems Engineer=20
    Phone: (954) 723-6332=20
    Pager: 9545530742 AT messaging.nextel DOT com=20
    Motorola, Inc.  / CGISS / Enterprise Computing=20
    8000 West Sunrise Blvd, MS 22-2F, Plantation, FL 33322=20




------=_NextPart_000_0050_01C3F49F.23B815B0
Content-Type: text/html;
        charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD><TITLE>Message</TITLE>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Diso-8859-1">
<META content=3D"MSHTML 6.00.2800.1400" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=3D#ffffff>
<DIV><FONT face=3DArial size=3D2><FONT size=3D2>
<P>I had simillar issue on L80. STK tech suggested that issue is may be =
related=20
to the Netbackup setting of "Media unmount delay" which is way too short =
by=20
default 3 minutes. What is happening tape is getting unmounted before it =

completely rewound. I had some arguments&nbsp;with my&nbsp;co-workers =
about=20
it.... Speed of LTO's&nbsp;vs DLT's, since the&nbsp;issues that never =
came up=20
when we used DLTs, but I was able to confirm the issues with my L80 logs =
some=20
are exactly what the tech suggested and some are similar. In addition =
the issues=20
does not seem to show up right away until LTO tapes have significant =
amount of=20
data - takes longer to rewind???... . I say if everything failes try to =
change=20
Media unmount delay to something like 15 minutes or so and see if it =
makes any=20
difference</P>
<P>--Denis</P></FONT></FONT></DIV>
<BLOCKQUOTE dir=3Dltr=20
style=3D"PADDING-RIGHT: 0px; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; =
BORDER-LEFT: #000000 2px solid; MARGIN-RIGHT: 0px">
  <DIV style=3D"FONT: 10pt arial">----- Original Message ----- </DIV>
  <DIV=20
  style=3D"BACKGROUND: #e4e4e4; FONT: 10pt arial; font-color: =
black"><B>From:</B>=20
  <A title=3Ddave.markham AT fjserv DOT net =
href=3D"mailto:dave.markham AT fjserv DOT net">Dave=20
  Markham</A> </DIV>
  <DIV style=3D"FONT: 10pt arial"><B>To:</B> <A =
title=3DRic.Sokolowski AT motorola DOT com=20
  href=3D"mailto:Ric.Sokolowski AT motorola DOT com">Sokolowski Ric-ERS004</A> =
; <A=20
  title=3Dveritas-bu AT mailman.eng.auburn DOT edu=20
  =
href=3D"mailto:veritas-bu AT mailman.eng.auburn DOT edu">veritas-bu AT mailman 
DOT eng.=
auburn.edu</A>=20
  </DIV>
  <DIV style=3D"FONT: 10pt arial"><B>Sent:</B> Wednesday, February 11, =
2004 9:23=20
  AM</DIV>
  <DIV style=3D"FONT: 10pt arial"><B>Subject:</B> Re: [Veritas-bu] HELP =
- media=20
  and I/O errors</DIV>
  <DIV><BR></DIV>
  <DIV><FONT face=3DArial size=3D2>Have the cables connecting the =
devices been=20
  replaced? </FONT></DIV>
  <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
  <DIV><FONT face=3DArial size=3D2>I have similar problem recently with =
LTO drives=20
  and tried many things. My environment was Solaris and there were some =
patches=20
  to apply ( although that doesn't help sorry ), but the cables were =
mentioned=20
  plus I found LTO media has a chip inside it which can be dislodged. If =
you=20
  shake the tapes and they rattle loudly then most likely they are =
damaged. This=20
  could be more than one tape if they have come from the same batch=20
  perhaps.</FONT></DIV>
  <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
  <DIV><FONT face=3DArial size=3D2>Just some ideas</FONT></DIV>
  <DIV><FONT face=3DArial size=3D2>Dave</FONT></DIV>
  <BLOCKQUOTE dir=3Dltr=20
  style=3D"PADDING-RIGHT: 0px; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; =
BORDER-LEFT: #000000 2px solid; MARGIN-RIGHT: 0px">
    <DIV style=3D"FONT: 10pt arial">----- Original Message ----- </DIV>
    <DIV=20
    style=3D"BACKGROUND: #e4e4e4; FONT: 10pt arial; font-color: =
black"><B>From:</B>=20
    <A title=3DRic.Sokolowski AT motorola DOT com=20
    href=3D"mailto:Ric.Sokolowski AT motorola DOT com">Sokolowski =
Ric-ERS004</A> </DIV>
    <DIV style=3D"FONT: 10pt arial"><B>To:</B> <A=20
    title=3Dveritas-bu AT mailman.eng.auburn DOT edu=20
    =
href=3D"mailto:'veritas-bu AT mailman.eng.auburn DOT 
edu'">'[email protected]=
ng.auburn.edu'</A>=20
    </DIV>
    <DIV style=3D"FONT: 10pt arial"><B>Sent:</B> Wednesday, February 11, =
2004 4:28=20
    PM</DIV>
    <DIV style=3D"FONT: 10pt arial"><B>Subject:</B> [Veritas-bu] HELP - =
media and=20
    I/O errors</DIV>
    <DIV><BR></DIV>
    <DIV><FONT face=3DArial size=3D2><SPAN =
class=3D070274215-11022004>Our=20
    system:</SPAN></FONT></DIV>
    <DIV><FONT face=3DArial size=3D2><SPAN=20
    class=3D070274215-11022004></SPAN></FONT>&nbsp;</DIV>
    <DIV><FONT face=3DArial size=3D2><SPAN class=3D070274215-11022004>NB =
4.5=20
    MP5</SPAN></FONT></DIV>
    <DIV><FONT face=3DArial size=3D2><SPAN =
class=3D070274215-11022004>master - HP-UX=20
    11.00</SPAN></FONT></DIV>
    <DIV><FONT face=3DArial size=3D2><SPAN =
class=3D070274215-11022004>media - 4 HP-UX=20
    11.00, 1 HP-UX 11.11</SPAN></FONT></DIV>
    <DIV><FONT face=3DArial size=3D2><SPAN =
class=3D070274215-11022004>STK L700=20
    (HP20/700)&nbsp;w/10 HP LTO 1 drives w/SSO</SPAN></FONT></DIV>
    <DIV><FONT face=3DArial size=3D2><SPAN class=3D070274215-11022004>5 =
HP 2/1 FC/SCSI=20
    bridges</SPAN></FONT></DIV>
    <DIV><FONT face=3DArial size=3D2><SPAN class=3D070274215-11022004>1 =
Brocade=20
    2800</SPAN></FONT></DIV>
    <DIV><FONT face=3DArial size=3D2><SPAN=20
    class=3D070274215-11022004></SPAN></FONT>&nbsp;</DIV>
    <DIV><FONT size=3D+0><SPAN class=3D070274215-11022004><FONT =
face=3DArial=20
    size=3D2>We're seeing tons of media-related errors (70% status 86 - =
media=20
    position, 30% status 84 - media write) spread=20
    across</FONT></SPAN></FONT></DIV>
    <DIV><FONT size=3D+0><SPAN class=3D070274215-11022004><FONT =
face=3DArial=20
    size=3D2>all drives.&nbsp; Some nights we </FONT></SPAN></FONT><FONT =

    size=3D+0><SPAN class=3D070274215-11022004><FONT face=3DArial =
size=3D2>see no=20
    errors, other nights we'll see 50-100 media-related failures.&nbsp; =
We see=20
    the failures when</FONT></SPAN></FONT></DIV>
    <DIV><FONT size=3D+0><SPAN class=3D070274215-11022004><FONT =
face=3DArial=20
    size=3D2>reusing tapes and with </FONT></SPAN></FONT><FONT =
size=3D+0><SPAN=20
    class=3D070274215-11022004><FONT face=3DArial size=3D2>brand new =
tapes.&nbsp; All=20
    drives have been cleaned recently.&nbsp; <SPAN =
class=3D070274215-11022004>We=20
    have had cases open w/Veritas and</SPAN></FONT></SPAN></FONT></DIV>
    <DIV><FONT size=3D+0><SPAN class=3D070274215-11022004><FONT =
face=3DArial=20
    size=3D2><SPAN class=3D070274215-11022004>HP for just over 4=20
    </SPAN></FONT></SPAN></FONT><FONT size=3D+0><SPAN=20
    class=3D070274215-11022004><FONT face=3DArial size=3D2><SPAN=20
    class=3D070274215-11022004>weeks now.&nbsp;&nbsp;Veritas has =
examined over a=20
    months worth of log files and has determined that=20
    the</SPAN></FONT></SPAN></FONT></DIV>
    <DIV><FONT size=3D+0><SPAN class=3D070274215-11022004><FONT =
face=3DArial=20
    size=3D2><SPAN class=3D070274215-11022004>problem is hardware=20
    </SPAN></FONT></SPAN></FONT><FONT size=3D+0><SPAN=20
    class=3D070274215-11022004><FONT face=3DArial size=3D2><SPAN=20
    class=3D070274215-11022004>related.&nbsp; </SPAN>HP replaced 3 =
drives, we saw=20
    media failures on these 3 new drives the same day they=20
    were</FONT></SPAN></FONT></DIV>
    <DIV><FONT size=3D+0><SPAN class=3D070274215-11022004><FONT =
face=3DArial=20
    size=3D2>replaced.&nbsp;&nbsp;HP </FONT></SPAN></FONT><FONT =
size=3D+0><SPAN=20
    class=3D070274215-11022004><FONT face=3DArial size=3D2>also replaced =
the robot=20
    controller, the camera, and one of the Fibre bridges.&nbsp; We're =
not seeing=20
    any</FONT></SPAN></FONT></DIV>
    <DIV><FONT size=3D+0><SPAN class=3D070274215-11022004><FONT =
face=3DArial=20
    size=3D2>communication </FONT></SPAN></FONT><FONT size=3D+0><SPAN=20
    class=3D070274215-11022004><FONT face=3DArial size=3D2>errors on the =
FC=20
    switch.&nbsp; Everything has the latest available=20
    firmware.&nbsp;&nbsp;Whenever we get&nbsp;the status=20
    84/86,</FONT></SPAN></FONT></DIV>
    <DIV><FONT size=3D+0><SPAN class=3D070274215-11022004><FONT =
face=3DArial size=3D2>we=20
    see a&nbsp; </FONT></SPAN></FONT><FONT size=3D+0><SPAN=20
    class=3D070274215-11022004><FONT face=3DArial size=3D2>lot of things =
like=20
    "</FONT><FONT face=3DArial size=3D2>cannot read from media socket =
10</FONT><SPAN=20
    class=3D070274215-11022004><FONT face=3DArial size=3D2>", =
"</FONT><FONT face=3DArial=20
    size=3D2>ioctl (MTREW) failed on media id 402280, </FONT><FONT =
face=3DArial=20
    size=3D2>drive index 4,</FONT></SPAN></SPAN></FONT></DIV>
    <DIV><FONT size=3D+0><SPAN class=3D070274215-11022004><SPAN=20
    class=3D070274215-11022004><FONT face=3DArial size=3D2>I/O=20
    </FONT></SPAN></SPAN></FONT><FONT size=3D+0><SPAN=20
    class=3D070274215-11022004><SPAN class=3D070274215-11022004><FONT =
face=3DArial=20
    size=3D2>error (bptm.c.7197)</FONT><SPAN =
class=3D070274215-11022004><FONT=20
    face=3DArial size=3D2>" and&nbsp;"</FONT><FONT face=3DArial =
size=3D2>write error on=20
    media id 402280, drive </FONT><FONT face=3DArial><FONT =
size=3D2>index 4, writing=20
    header block, I/O error<SPAN class=3D070274215-11022004>".&nbsp;=20
    Normally,</SPAN></FONT></FONT></SPAN></SPAN></SPAN></FONT></DIV>
    <DIV><FONT size=3D+0><SPAN class=3D070274215-11022004><SPAN=20
    class=3D070274215-11022004><SPAN class=3D070274215-11022004><FONT=20
    face=3DArial><FONT size=3D2><SPAN=20
    =
class=3D070274215-11022004></SPAN></FONT></FONT></SPAN></SPAN></SPAN></FO=
NT><FONT=20
    size=3D+0><SPAN class=3D070274215-11022004><SPAN =
class=3D070274215-11022004><SPAN=20
    class=3D070274215-11022004><FONT face=3DArial><FONT size=3D2><SPAN=20
    class=3D070274215-11022004>between 2 and 5 drives are downed =
every&nbsp;night=20
    - always with a tape stuck in the the drive.&nbsp; Occasionally the =
system=20
    </SPAN></FONT></FONT></SPAN></SPAN></SPAN></FONT><FONT =
size=3D+0><SPAN=20
    class=3D070274215-11022004><SPAN class=3D070274215-11022004><SPAN=20
    class=3D070274215-11022004><FONT face=3DArial><FONT size=3D2><SPAN=20
    =
class=3D070274215-11022004>will</SPAN></FONT></FONT></SPAN></SPAN></SPAN>=
</FONT></DIV>
    <DIV><FONT size=3D+0><SPAN class=3D070274215-11022004><SPAN=20
    class=3D070274215-11022004><SPAN class=3D070274215-11022004><FONT=20
    face=3DArial><FONT size=3D2><SPAN class=3D070274215-11022004>freeze =
dozens of=20
    tapes because they're seen as "unmountable" which leads to a =
boatload of=20
    status 96 (no =
media)</SPAN></FONT></FONT></SPAN></SPAN></SPAN></FONT></DIV>
    <DIV><FONT size=3D+0><SPAN class=3D070274215-11022004><SPAN=20
    class=3D070274215-11022004><SPAN class=3D070274215-11022004><FONT=20
    face=3DArial><FONT size=3D2><SPAN=20
    =
class=3D070274215-11022004></SPAN></FONT></FONT></SPAN></SPAN></SPAN></FO=
NT><FONT=20
    size=3D+0><SPAN class=3D070274215-11022004><SPAN =
class=3D070274215-11022004><SPAN=20
    class=3D070274215-11022004><FONT face=3DArial><FONT size=3D2><SPAN=20
    class=3D070274215-11022004>failures.&nbsp; Our backup success rate =
has dropped=20
    from over 98% to below 80% - management is freaking out.&nbsp;=20
    We're</SPAN></FONT></FONT></SPAN></SPAN></SPAN></FONT></DIV>
    <DIV><FONT size=3D+0><SPAN class=3D070274215-11022004><SPAN=20
    class=3D070274215-11022004><SPAN class=3D070274215-11022004><FONT=20
    face=3DArial><FONT size=3D2><SPAN=20
    =
class=3D070274215-11022004></SPAN></FONT></FONT></SPAN></SPAN></SPAN></FO=
NT><FONT=20
    size=3D+0><SPAN class=3D070274215-11022004><SPAN =
class=3D070274215-11022004><SPAN=20
    class=3D070274215-11022004><FONT face=3DArial><FONT size=3D2><SPAN=20
    class=3D070274215-11022004>grasping at straws here folks, any help =
would be=20
    GREATLY =
appreciated!</SPAN></FONT></FONT></SPAN></SPAN></SPAN></FONT></DIV>
    <DIV><FONT size=3D1><FONT face=3D"Comic Sans =
MS"></FONT></FONT>&nbsp;</DIV>
    <DIV><FONT size=3D1><FONT face=3D"Comic Sans MS">-- <BR>Regards,=20
    </FONT></FONT></DIV>
    <P><FONT size=3D1><FONT face=3D"Comic Sans MS">Ric Sokolowski=20
    (Ric.Sokolowski AT motorola DOT com) <BR>Staff Systems Engineer <BR>Phone: =
(954)=20
    723-6332 <BR>Pager: 9545530742 AT messaging.nextel DOT com <BR>Motorola, =
Inc.&nbsp;=20
    / CGISS / Enterprise Computing <BR>8000 West Sunrise Blvd, MS 22-2F, =

    Plantation, FL 33322 </FONT></FONT></P><BR>
    <DIV>&nbsp;</DIV></BLOCKQUOTE></BLOCKQUOTE></BODY></HTML>

------=_NextPart_000_0050_01C3F49F.23B815B0--


<Prev in Thread] Current Thread [Next in Thread>