BackupPC-users

Re: [BackupPC-users] Bad md5sums due to zero size (uncompressed) cpool files - WEIRD BUG

2011-10-06 11:56:42
Subject: Re: [BackupPC-users] Bad md5sums due to zero size (uncompressed) cpool files - WEIRD BUG
From: Holger Parplies <wbppc AT parplies DOT de>
To: Tim Fletcher <tim AT night-shade.org DOT uk>
Date: Thu, 6 Oct 2011 17:54:05 +0200
Hi,

Tim Fletcher wrote on 2011-10-06 10:17:03 +0100 [Re: [BackupPC-users] Bad 
md5sums due to zero size (uncompressed) cpool files - WEIRD BUG]:
> On Wed, 2011-10-05 at 21:35 -0400, Jeffrey J. Kosowsky wrote:
> > Finally, remember it's possible that many people are having this
> > problem but just don't know it,

perfectly possible. I was just saying what possible cause came to my mind (any
many people *could* be running with an almost full disk). As you (Jeffrey)
said, the fact that the errors appeared only within a small time frame may or
may not be significant. I guess I don't need to ask whether you are *sure*
that the disk wasn't almost full back then.

To be honest, I would *hope* that only you had these issues and everyone
else's backups are fine, i.e. that your hardware and not the BackupPC software
was the trigger (though it would probably need some sort of software bug to
come up with the exact symptoms).

> > since the only way one would know would be if one actually computed the
> > partial file md5sums of all the pool files and/or restored & tested ones
> > backups.

Almost.

> > Since the error affects only 71 out of 1.1 million files it's possible
> > that no one has ever noticed...

Well, let's think about that for a moment. We *have* had multiple issues that
*sounded* like corrupt attrib files. What would happen, if you had an attrib
file that decompresses to "" in the reference backup?

> > It would be interesting if other people would run a test on their
> > pools to see if they have similar such issues (remember I only tested
> > my pool in response to the recent thread of the guy who was having
> > issues with his pool)...
> 
> Do you have a script or series of commands to do this check with?

Actually, what I would propose in response to what you have found would be to
test for pool files that decompress to zero length. That should be
computationally less expensive than computing hashes - in particular, you can
stop decompressing once you have decompressed any content at all. Sure, that
just checks for this issue, not for possible different ones. On the one hand,
having the *correct* content in the pool under an incorrect hash would not be
a *serious* issue - it wouldn't prevent restoring your data, it would just
make pooling not work correctly (for the files affected). On the other,
different instances of this problem might point toward a common cause. And I
guess it would be possible to have *truncated* data (i.e. not zero-length, but
incomplete just the same) in your files as well.

You weren't asking me, but, yes, I wrote a script to check pool file contents
against the file names back in 2007. I'll append it here, but it would really
be interesting to add information on whether the file decompressed to
zero-length. I could easily add the decompressed file length to the output,
but it would make lines longer than 80 characters. Ok, I did that (and added
counting of zero-length files) - please make your terminals at least 93
characters wide :). I just scanned 1/16th of my pool and found various
mismatches, though none of them zero-length. Probably top-level attrib files.
Link counts might be interesting - I'll add them later.

> I have access to a couple of backuppc installs of various ages and sizes
> that I can test.

Try something like

        BackupPC_verifyPool -s -p

to scan the whole pool, or

        BackupPC_verifyPool -s -p -r 0

to test it on the 0/0/0 - 0/0/f pool subdirectories (-r takes a Perl
expression evaluating to an array of numbers between 0 and 255, e.g. "0",
"0 .. 255" (the default), or "0, 1, 10 .. 15, 5"; note the quotes to make your
shell pass it as a single argument). If you have switched off compression,
you'll have to add a '-u' (though I'm not sure this test makes much sense in
that case). You'll want either '-p' (progress) or '-v' (verbose) to see
anything happening. It *will* take time to traverse the pool, but you can
safely interrupt the script at any time and use the range parameter to resume
it later (though not at the exact place) - or just suspend and resume it (^Z).

You might need to change the 'use lib' statement in line 64 to match your
distribution.

Hope that helps.

Regards,
Holger

Attachment: BackupPC_verifyPool
Description: Text document

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/
<Prev in Thread] Current Thread [Next in Thread>