ADSM-L

Re: [ADSM-L] Magic Decoder Ring needed

2017-10-10 14:19:43
Subject: Re: [ADSM-L] Magic Decoder Ring needed
From: Zoltan Forray <zforray AT VCU DOT EDU>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Tue, 10 Oct 2017 14:18:01 -0400
Thank you for the info.  We have started running AUDIT's but with 30TB+ in
this disk stgpool, it will take a while.  I am very interested in
additional details on the RAID firmware issue you mentioned - any specifics
would be very helpful.  AFAIK, we are up-to-date on all Dell firmware (we
patch fairly regularly).

Within the past 9-months, this server has had 3-diskpool volumes (all part
of RAID-5 arrays) suddenly become "bad", requiring full restores, with no
explanation since there was no sign of hardware problems. While I did open
a PMR with IBM, by the time they looked at my last failure, they said there
was nothing they could do to analyze the problem and to call them back the
next time it happens.

On Tue, Oct 10, 2017 at 2:04 PM, Skylar Thompson <skylar2 AT u.washington DOT 
edu>
wrote:

> Hi Zoltan,
>
> We ran into this recently, and it was caused by a firmware bug in a RAID
> adapter that caused it not to fail and obviously-failing disk in our disk
> spool. We followed the procedure here:
>
> https://www.ibm.com/support/knowledgecenter/en/SSGSG7_7.1.
> 6/tshoot/r_pdg_1330_1331_msg.html
>
> It did take a few AUDIT VOLUME-MOVE DATA cycles to find everything but now
> it's happy. In a few cases, the file shown by SHOW INVO was obviously
> detritus, so we deleted it client-side with DELETE BACKUP instead of an
> audit, because it takes a long time to audit our disk volumes.
>
> On Tue, Oct 10, 2017 at 01:56:47PM -0400, Zoltan Forray wrote:
> > Recently we started seeing these errors on one of our servers:
> >
> > 10/10/2017 13:35:51  ANR1330E The server has detected possible corruption
> > in
> >                       an object that is being restored or moved. The
> actual
> >
> >                       values for the incorrect frame are: magic 53454652
> > hdr
> >                       version    2 hdr length    32 sequence number
> >  22610
> >                       data length    3FFB0 server ID        0 segment ID
> >
> >                       2720223190 crc        0. (SESSION: 39218, PROCESS:
> > 171)
> > 10/10/2017 13:35:51  ANR1331E Invalid frame detected.  Expected magic
> > 53454652
> >
> > The Process ID points to a Backup Stgpool process (the only thing
> running),
> > not anything being "moved or restored".  There are also a bunch of
> sessions
> > running/stuck/hung but that is a different problem.
> >
> > Any idea on how to determine what is causing this?  We've seen the error
> > quite a few times within the past few days.
> >
> > --
> > *Zoltan Forray*
> > Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator
> > Xymon Monitor Administrator
> > VMware Administrator
> > Virginia Commonwealth University
> > UCC/Office of Technology Services
> > www.ucc.vcu.edu
> > zforray AT vcu DOT edu - 804-828-4807
> > Don't be a phishing victim - VCU and other reputable organizations will
> > never use email to request that you reply with your password, social
> > security number or confidential personal information. For more details
> > visit http://phishing.vcu.edu/
>
> --
> -- Skylar Thompson (skylar2 AT u.washington DOT edu)
> -- Genome Sciences Department, System Administrator
> -- Foege Building S046, (206)-685-7354
> -- University of Washington School of Medicine
>



--
*Zoltan Forray*
Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator
Xymon Monitor Administrator
VMware Administrator
Virginia Commonwealth University
UCC/Office of Technology Services
www.ucc.vcu.edu
zforray AT vcu DOT edu - 804-828-4807
Don't be a phishing victim - VCU and other reputable organizations will
never use email to request that you reply with your password, social
security number or confidential personal information. For more details
visit http://phishing.vcu.edu/

<Prev in Thread] Current Thread [Next in Thread>

ADSM.ORG Privacy and Data Security by KimLaw, PLLC