On Aug 28, 2010, at 7:12 AM, Steve Costaras wrote:
Could be due to a transient error
(transmission or wild/torn read at time of calculation). I see
this a lot with integrity checking of files here (50TiB of
storage).
Only way to get around this now is to do a known-good sha1/md5
hash of data (2-3 reads of the file make sure that they all match
and that the file is not corrupted) save that as a baseline and
then when doing reads/compares if one fails do another re-read and
see if the first one was in error and compare that with your
baseline. This is one reason why I'm switching to the new
generation of sas drives that have ioecc checks on READS not just
writes to help cut down on some of this.
Corruption does occur as well and is more probable with the higher
the capacity of the drive. Ideally you would have a drive that
would do ioecc on reads, plus using T10 PI extensions (DIX/DIF)
from drive to controller up to your file system layer. It won't
always prevent it by itself but would allow if you have a raid
setup to do some self-healing when a drive reports a non transient
(i.e. corrupted sector of data).
However the T10 PI extensions are only on sas/fc drives (520/528
byte blocks) and so far as I can tell only the new LSI hba's
support a small subset of this (no hardware raid controllers I can
find) and have not seen any support up to the OS/filesystem
level. SATA is not included at all as the T13 group opted not
to include it in the spec.
You could also stick with your current hardware and use a file system that emphasises end-to-end data integrity like ZFS. ZFS checksums at many levels, and has a "don't trust the hardware" mentality. It can detect silent data corruption and automatically self-heal where redundancy permits.
ZFS also supports pool scrubbing---akin to the "patrol reading" of many RAID controllers---for proactive detection of silent data corruption. With drive capacities becoming very large, the probability of an unrecoverable read becomes very high. This becomes very significant even in redundant storage systems because a drive failure necessitates a lengthy rebuild period during which the storage array lacks any redundancy (in the case of RAID-5). It is for this reason that RAID-6 (ZFS raidz2) is becoming de rigeur for many-terabyte arrays using large drives, and, specifically, the reason ZFS garnered its triple-parity raidz3 pool type (in ZFS pool version 17).
I believe Btrfs intends to bring many ZFS features to Linux.
Paul.