ADSM-L

Re: Yikes LTO2 problem!?!

2003-06-24 04:53:51
Subject: Re: Yikes LTO2 problem!?!
From: "Frost, Dave" <Dave.Frost AT SUNGARD DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Tue, 24 Jun 2003 09:53:30 +0100
Matthew,

Have you had any interesting records in /var/adm/messages that you can
match up to when one or more of the tapes was mounted?  (ANR8468I  volume
<x> dismounted is a good search key).

We have only seen this on san-attached devices, and then only when a RSCN
has occurred on the fabric.  During reads or writes a block will be
silently dropped.  Reads are recoverable...


Regards,

-=Dave=-
+44 (0) 20 7608 7140

Accountants are good with figures.



                      Andrew Raibeck
                      <storman AT US DOT IBM.C        To:       ADSM-L AT 
VM.MARIST DOT EDU
                      OM>                      cc:
                      Sent by: "ADSM:          Subject:  Re: Yikes LTO2 
problem!?!
                      Dist Stor
                      Manager"
                      <[email protected]
                      .EDU>


                      24/06/2003 02:06
                      Please respond to
                      "ADSM: Dist Stor
                      Manager"






> Tivoli support said to turn on CRCchecking.
> I say, uh yeah, I already know my data is bad.

> Looks like I gotta try and figure out which drive/disk/scsi
> card is causing the problem myself.

Is there any activity log data from the time the data was written to the
problem volume(s) that shows any problems with the writes? If so, then
that would be useful for Level 2 to have. If not, then determining how it
got that way is difficult at best.

I do not believe the suggestion to turn on CRC checking is intended as the
cure. Rather, it is intended as a diagnostic aid to try to catch any new
instances of incorrectly written data as they occur, which would help
pinpoint the problem. (Just my take from my admittedly cursory review of
your PMR.)

Regards,

Andy

Andy Raibeck
IBM Software Group
Tivoli Storage Manager Client Development
Internal Notes e-mail: Andrew Raibeck/Tucson/IBM@IBMUS
Internet e-mail: storman AT us.ibm DOT com

The only dumb question is the one that goes unasked.
The command line is your friend.
"Good enough" is the enemy of excellence.




Matthew Glanville <matthew.glanville AT KODAK DOT COM>
Sent by: "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>
06/23/2003 15:47
Please respond to "ADSM: Dist Stor Manager"


        To:     ADSM-L AT VM.MARIST DOT EDU
        cc:
        Subject:        Yikes LTO2 problem!?!



Problem:
    Lots and lots of tapes reporting errors when, auditing, copying,
moving
data from them...

06/23/03   10:52:37      ANR9999D pvrntp.c(4586): ThreadId<15> Invalid
block header read from NTP drive DRIVE5 (/dev/rmt/10st).(magic=5A4D4E50,
ver=5, Hdr blk=1450 <expected 1451>, dbytes=262096 <262096>)

(Thus thousands of files on several tapes are unreadable)

TSM Server 5.1.6.3 on Solaris 9 (Sun V880 server)
IBMtape 4.0.7.3
8 LTO2 drives in IBM 3584 library (SCSI not Fiber attached) ( firmware
version 3641)

Hmm, all my testing of filling/restoring a few tapes of data said I could
backup/restore fine...
But the day after the system goes into production and we start moving real
data to these tapes, whammo..

Tivoli support said to turn on CRCchecking.
I say, uh yeah, I already know my data is bad.

Looks like I gotta try and figure out which drive/disk/scsi card is
causing
the problem myself.

Any suggestions?

Thanks
  Matt G

<Prev in Thread] Current Thread [Next in Thread>