BackupPC-users

Re: [BackupPC-users] Bad md5sums due to zero size (uncompressed) cpool files - WEIRD BUG

2011-10-05 11:44:10
Subject: Re: [BackupPC-users] Bad md5sums due to zero size (uncompressed) cpool files - WEIRD BUG
From: Holger Parplies <wbppc AT parplies DOT de>
To: "Jeffrey J. Kosowsky" <backuppc AT kosowsky DOT org>
Date: Wed, 5 Oct 2011 17:41:48 +0200
Hi,

Jeffrey J. Kosowsky wrote on 2011-10-04 18:58:51 -0400 [[BackupPC-users] Bad 
md5sums due to zero size (uncompressed) cpool files - WEIRD BUG]:
> After the recent thread on bad md5sum file names, I ran a check on all
> my 1.1 million cpool files to check whether the md5sum file names are
> correct.
> 
> I got a total of 71 errors out of 1.1 million files:
> [...]
> - 68 of the 71 were *zero* sized when decompressed
> [...]
> Each such cpool file has anywhere from 2 to several thousand links
> [...]
> It turns out though that none of those zero-length decompressed cpool
> files were originally zero length but somehow they were stored in the
> pool as zero length with an md5sum that is correct for the original
> non-zero length file.
> [...]
> Now it seems unlikely that the files were corrupted after the backups
> were completed since the header and trailers are correct and there is
> no way that the filesystem would just happen to zero out the data
> while leaving the header and trailers intact (including checksums).
> [...]
> Also, on my latest full backup a spot check shows that the files are
> backed up correctly to the right non-zero length cpool file which of
> course has the same (now correct) partial file md5sum. Though as you
> would expect, that cpool file has a _0 suffix since the earlier zero
> length is already stored (incorrectly) as the base of the chain.
> [...]
> In summary, what could possibly cause BackupPC to truncate the data
> sometime between reading the file/calculating the partial file md5sum
> and compressing/writing the file to the cpool?

the first and only thing that springs to my mind is a full disk. In some
situations, BackupPC needs to create a temporary file (RStmp, I think) to
reconstruct the remote file contents. This file can become quite large, I
suppose. Independant of that, I remember there is *at least* an "incorrect
size" fixup which needs to copy already written content to a different hash
chain (because the hash turns out to be incorrect *after*
transmission/compression). Without looking closely at the code, I could
imagine (but am not sure) that this could interact badly with a full disk:

* output file is already open, headers have been written
* huge RStmp file is written, filling up the disk
* received file contents are for some reason written to disk (which doesn't
  work - no space left) and read back for writing into the output file (giving
  zero-length contents)
* trailing information is written to the output file - this works, because
  there is enough space left in the already allocated block for the file
* RStmp file gets removed and the rest of the backup continues without
  apparent error

Actually, for the case I tried to invent above, this doesn't seem to fit, but
the general idea could apply - at least the symptoms are "correct content
stored somewhere but read back incorrectly". This would mean the result of a
write operation would have to be unchecked by BackupPC somewhere (or handled
incorrectly).

So, the question is: have you been running BackupPC with an almost full disk?
Would there be at least one file in the backup set, of which the
*uncompressed* size is large in comparison to the reserved space (->
DfMaxUsagePct)?

For the moment, that's the most concrete thing I can think of. Of course,
writing to a temporary location might be fine an reading could fail (you
haven't modified your BackupPC code to use a signal handler for some arbitrary
purposes, have you? ;-). Or your Perl version could have an obscure bug that
occasionally trashes the contents of a string. Doesn't sound very likely,
though.

What *size* are the original files?

Ah, yes. How many backups are (or rather were) you running in parallel? Noone
said the RStmp needs to be created by the affected backup ...

Regards,
Holger

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/