On Aug 28, 2010, at 7:12 AM, Steve Costaras wrote:
Could be due to a transient error (transmission or
wild/torn read at time of calculation). I see this a lot
with integrity checking of files here (50TiB of storage).
Only way to get around this now is to do a known-good
sha1/md5 hash of data (2-3 reads of the file make sure
that they all match and that the file is not corrupted)
save that as a baseline and then when doing reads/compares
if one fails do another re-read and see if the first one
was in error and compare that with your baseline. This
is one reason why I'm switching to the new generation of
sas drives that have ioecc checks on READS not just writes
to help cut down on some of this.
Corruption does occur as well and is more probable with
the higher the capacity of the drive. Ideally you
would have a drive that would do ioecc on reads, plus
using T10 PI extensions (DIX/DIF) from drive to controller
up to your file system layer. It won't always prevent
it by itself but would allow if you have a raid setup to
do some self-healing when a drive reports a non transient
(i.e. corrupted sector of data).
However the T10 PI extensions are only on sas/fc drives
(520/528 byte blocks) and so far as I can tell only the
new LSI hba's support a small subset of this (no hardware
raid controllers I can find) and have not seen any support
up to the OS/filesystem level. SATA is not included at
all as the T13 group opted not to include it in the spec.
You could also stick with your current hardware and use a
file system that emphasises end-to-end data integrity like
ZFS. ZFS checksums at many levels, and has a "don't trust the
hardware" mentality. It can detect silent data corruption and
automatically self-heal where redundancy permits.
ZFS also supports pool scrubbing---akin to the "patrol
reading" of many RAID controllers---for proactive detection of
silent data corruption. With drive capacities becoming very
large, the probability of an unrecoverable read becomes very
high. This becomes very significant even in redundant storage
systems because a drive failure necessitates a lengthy rebuild
period during which the storage array lacks any redundancy (in
the case of RAID-5). It is for this reason that RAID-6 (ZFS
raidz2) is becoming de rigeur for many-terabyte arrays using
large drives, and, specifically, the reason ZFS garnered its
triple-parity raidz3 pool type (in ZFS pool version 17).
I believe Btrfs intends to bring many ZFS features to
Linux.
Paul.