ADSM-L

Re: ANR13300E & ANR1331E

2005-11-02 07:43:44
Subject: Re: ANR13300E & ANR1331E
From: Richard Sims <rbs AT BU DOT EDU>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Wed, 2 Nov 2005 07:42:22 -0500
Thanks for sharing that real-life case, Neil. We are historically
accustomed to digital data processing inherently assuring data
integrity.  It was dismaying, then, to read the TSM 5.1 Technical
Guide redbook and see it say things like, "New communication and SAN
hardware products are more susceptible to data loss" and "...data
corruption introduced either by the network or by errors within the
storage environment",  Data loss??  Corruption??  Indeed.  The TSM
developers realized this and provided CRC functionality to help
validate data integrity.  If you have a complex data transfer and/or
storage environment, you may want to consider turning on CRC.

   Richard Sims

On Nov 2, 2005, at 5:30 AM, Neil Schofield wrote:

We had issues like this 18 months ago. It coincided with us adding
some new
HBAs to balance the load. It transpired that one of the HBAs was
silently
corrupting the data as it was writing it to tape. We only realised
when we
came to read the data back. The problem was that the same HBA was
being
used to write data to both the local (primary) and remote (copy)
storage
pools over a long-distance SAN, so we lost some data.

Once we identified the HBA as the source of the problem and removed
it, we
then had the task of identifying every tape that had been written
using it.
Almost all were bad! We deleted the copy tapes and for primary
tapes, we
restored from copy tapes where possible.

The HBA - an Emulex LP9802 - was described as having 'end-to-end
parity
protection' but this didn't work for us. I opened a PMR, but IBM
made the
observation that they are not responsible for the data after it has
been
passed to the HBA.

So maybe one of your SCSI/FC adapters has gone bad?

Regards
Neil Schofield
Yorkshire Water Services Ltd.

<Prev in Thread] Current Thread [Next in Thread>