Bacula-users

Re: [Bacula-users] Verify differences: SHA1 sum doesn't match but it should

2010-08-30 05:42:18
Subject: Re: [Bacula-users] Verify differences: SHA1 sum doesn't match but it should
From: Steve Costaras <stevecs AT chaven DOT com>
To: Paul Mather <paul AT gromit.dlib.vt DOT edu>, "bacula-users AT lists.sourceforge DOT net" <bacula-users AT lists.sourceforge DOT net>
Date: Mon, 30 Aug 2010 04:39:23 -0500
  A little mis-quoted there:

On 2010-08-30 02:59, Henrik Johansen wrote:
>> On Aug 28, 2010, at 7:12 AM, Steve Costaras wrote:
>>
>> Could be due to a transient error (transmission or wild/torn read at
>> time of calculation).  I see this a lot with integrity checking of
>> files here (50TiB of storage).
>>
>> Only way to get around this now is to do a known-good sha1/md5 hash of
>> data (2-3 reads of the file make sure that they all match and that the
>> file is not corrupted) save that as a baseline and then when doing
>> reads/compares if one fails do another re-read and see if the first one
>> was in error and compare that with your baseline.  This is one reason
>> why I'm switching to the new generation of sas drives that have ioecc
>> checks on READS not just writes to help cut down on some of this.
>>
>> Corruption does occur as well and is more probable with the higher the
>> capacity of the drive.  Ideally you would have a drive that would do
>> ioecc on reads, plus using T10 PI extensions (DIX/DIF) from drive to
>> controller up to your file system layer.  It won't always prevent it by
>> itself but would allow if you have a raid setup to do some self-healing
>> when a drive reports a non transient (i.e. corrupted sector of data).
>>
>> However the T10 PI extensions are only on sas/fc drives (520/528 byte
>> blocks) and so far as I can tell only the new LSI hba's support a small
>> subset of this (no hardware raid controllers I can find) and have not
>> seen any support up to the OS/filesystem level.  SATA is not included
>> at all as the T13 group opted not to include it in the spec.
>>
>> You could also stick with your current hardware and use a file system
>> that emphasises end-to-end data integrity like ZFS.  ZFS checksums at
>> many levels, and has a "don't trust the hardware" mentality.  It can
>> detect silent data corruption and automatically self-heal where
>> redundancy permits.
>>

'Paul Mather' wrote:
>> ZFS also supports pool scrubbing---akin to the "patrol reading" of many
>> RAID controllers---for proactive detection of silent data corruption.
>> With drive capacities becoming very large, the probability of an
>> unrecoverable read becomes very high.  This becomes very significant
>> even in redundant storage systems because a drive failure necessitates
>> a lengthy rebuild period during which the storage array lacks any
>> redundancy (in the case of RAID-5).  It is for this reason that RAID-6
>> (ZFS raidz2) is becoming de rigeur for many-terabyte arrays using large
>> drives, and, specifically, the reason ZFS garnered its triple-parity
>> raidz3 pool type (in ZFS pool version 17).
>

On 2010-08-30 02:59, Henrik Johansen wrote:
> Have you ever tried scrubbing a 40+ TB pool ?

If the  question was to me, then yes, I have but with the comment that I 
am working with SANs and otherwise
redundant luns/disks that I run ZFS on top of.    So the availability 
portion of the disk subsystem is pretty stable
already.  I use ZFS mainly to check/verify data integrity as well as for 
volume management functions.    For
performance reasons I am mainly using mirroring.   When pool sizes get 
large 50, 100, or more TiB the problem
is the time it takes to do a scrub and the cpu & i/o costs are high.     
For ~50TiB I would say you would want
to have a subsystem that is capable of 2-3GiB/s.     And then increase 
that in proportion with larger sets.
Even then it takes a toll on a system that the primary job is NOT disk 
integrity but to run X application.


------------------------------------------------------------------------------
Sell apps to millions through the Intel(R) Atom(Tm) Developer Program
Be part of this innovative community and reach millions of netbook users 
worldwide. Take advantage of special opportunities to increase revenue and 
speed time-to-market. Join now, and jumpstart your future.
http://p.sf.net/sfu/intel-atom-d2d
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users