Veritas-bu

Re: [Veritas-bu] Tape/media errors/HP LTO-3

2011-12-14 16:43:17
Subject: Re: [Veritas-bu] Tape/media errors/HP LTO-3
From: Robyn Hirano <robyn.hirano AT roddconsulting.com DOT au>
To: Justin Piszcz <jpiszcz AT lucidpixels DOT com>
Date: Thu, 15 Dec 2011 08:40:06 +1100
Hi,

You don't *have* to have reach/put errors for it to be a robot arm issue, there are more points of failure in a robotic arm than this. Reach/put errors are just the obvious alignment ones.
(When I was level 2, if I had an error with robot in it, it was pretty easy to ask a hw engineer to do an onsite just to health check it - even if it had been checked recently.)

I also agree with Kevin's comment that you have a data loss situation if the frozen tape is put back into circulation, so it's not a situation to treat lightly. I'd work all angles.

Given you've already:
  • confirmed firmware is up-to-date
  • robotic arm has been replaced

I'd then:
  • Make sure that tape is pulled out of circulation, so that noone accidentally unfreezes it
  • Start collecting iostat stats
  • Get level 3 to do a diagnostic dump and analyse - not all errors are reported to syslog/bptm - before replacing the tape drive, cos once you pull it you lose that history (defintiely would do, if you've already been replacing LTO3 drives)
  • Check what was replaced - was it the whole robotic component or a portion, just how many tape drives have been replaced, what were the serial IDs.
  • Get someone to check robotic arm operation, in case the wrong component was replaced

If this comes up blank, as a level 2, I'd be escalating to level 3 so that they are across decision to replace and fully investigate the the pending and previous replacements.

Again, hope this helps.I'm mainly just pulling from my collective memory of lots of tape support cases - I've even seen replaced tape drives being diverted to support for stress testing when there was a silent error, but this was only the once - normally it was possible to get a reason if you dug.

Robyn

On Wed, Dec 14, 2011 at 11:58 PM, Justin Piszcz <jpiszcz AT lucidpixels DOT com> wrote:
Hi,

Thanks for the reply we are at the current revision for the robot that
they recommend, we have replaced arms in the past but cannot confirm
or deny whether that has fixed any of the problems.  Normally (again,
normally..) when there are robot arm issues there are reach/put errors
etc, have not seen them in this case..

Justin.

On Wed, Dec 14, 2011 at 7:35 AM, Robyn Hirano
<robyn.hirano AT roddconsulting.com DOT au> wrote:
> Hi,
>
> That looks like a robotic arm problem rather than the tape drive or tapes.
>
> I'd be checking the robotics firmware (there's a command or the library
> panel normally shows as well) and requesting an engineer onsite to
> healthcheck the robotic arm.
> But it's often one of the components associated with the gripper (robotics)
> that's out of alignment needing alignment or replacing.
>
> Robyn
>
> --
> Robyn Hirano
> Rodd Consulting Pty Ltd
> M: +61 412 352 725
> E: robyn.hirano AT roddconsulting.com DOT au
>
>
> -----Original Message-----
> From: veritas-bu-bounces AT mailman.eng.auburn DOT edu
> [mailto:veritas-bu-bounces AT mailman.eng.auburn DOT edu] On Behalf Of Justin
> Piszcz
> Sent: Wednesday, 14 December 2011 11:07 PM
> To: veritas-bu AT mailman.eng.auburn DOT edu
> Subject: [Veritas-bu] Tape/media errors/HP LTO-3
>
> Hi,
>
> We're running the latest F/W for HP LTO-3 tape drives (M6BS) for
> 4.0GBPS/FC drives.
>
> As was noted in the previous conversation, errors such as:
> 1323762270 1 386 16 media-server 0 0 0 *NULL* bptm error unloading
> media, TpErrno = Robot operation failed
> 1322549252 1 388 16 media-server 1136618 1136513 0 client-hostname
> bptm FREEZING media id XAC228, External event caused rewind during
> write, all data on media is lost
>
> When these errors occur in your environments (on multiple tapes) do
> you get the drives replaced in advanced or wait for them to fail
> completely?  In the past I had been getting them replaced regularly
> but its getting problematic they used to be servicing components
> multiple times per ewek.
>
> Justin.
> _______________________________________________
> Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
>
> -----
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 10.0.1415 / Virus Database: 2102/4079 - Release Date: 12/13/11
>
> _______________________________________________
> Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu



--
Robyn Hirano
RODD Consulting Pty Ltd
M: +61 412 352 725
E: robyn.hirano AT roddconsulting.com DOT au
_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
<Prev in Thread] Current Thread [Next in Thread>