Bacula-users

Re: [Bacula-users] Verify differences: SHA1 sum doesn't match but it should

2010-08-30 12:38:28
Subject: Re: [Bacula-users] Verify differences: SHA1 sum doesn't match but it should
From: Tobias Brink <tobias.brink AT gmail DOT com>
To: bacula-users AT lists.sourceforge DOT net
Date: Mon, 30 Aug 2010 18:35:45 +0200
Steve Costaras <stevecs AT chaven DOT com> writes:

>  Could be due to a transient error (transmission or wild/torn read at
> time of calculation).   I see this a lot with integrity checking of
> files here (50TiB of storage).
>
> Only way to get around this now is to do a known-good sha1/md5 hash of
> data (2-3 reads of the file make sure that they all match and that the
> file is not corrupted) save that as a baseline and then when doing
> reads/compares if one fails do another re-read and see if the first
> one was in error and compare that with your baseline.     This is one
> reason why I'm switching to the new generation of sas drives that have
> ioecc checks on READS not just writes to help cut down on some of
> this.
>
> Corruption does occur as well and is more probable with the higher the
> capacity of the drive.     Ideally you would have a drive that would
> do ioecc on reads, plus using T10 PI extensions (DIX/DIF) from drive
> to controller up to your file system layer.    It won't always prevent
> it by itself but would allow if you have a raid setup to do some
> self-healing when a drive reports a non transient (i.e. corrupted
> sector of data).

First off, thanks for the answers.  The thing is that I am well aware of
the reliability problems of hard drives and I would love to use some
advanced file system like ZFS or btrfs, but I am on Debian and I will
stay on Debian.  And btrfs is not mature enough to be used in production
at the moment.  The other thing is that I do not think that this is an
issue of corruption of the data itself!  As I said I checked the files
against backups and MD5 sums supplied by Debian (several times and from
cold cache) and the data seems to be OK.  The executables that are
reported by Bacula to have changed continue to work well and bug-free
just as before.

So I think this is a problem/bug with either the Postgresql database or
Bacula, not with my hard drives.  I just wonder how something like this
could happen and how I could avoid this.  I'm also not willing to do
additional checksums with other programs (AIDE or similar) because they
take _lots_ of time to run.  With Bacula I get the checksums for free.
I just want to use them to detect corruption on disk from time to time
and because I use VirtualFull and want to know if my differential
backups have missed something.

So I still don't know how to proceed.  Apart from that I will try to
upgrade my director and sd to 5.0.2 as soon as Debian backports are
available and see if the problem goes away.  I will also re-run the
DiskToCatalog after my next differential backup and see if something
is different.

Thanks,
Tobias

------------------------------------------------------------------------------
This SF.net Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users