Re: [BackupPC-users] Bad md5sums due to zero size (uncompressed) cpool f

Jeffrey J. Kosowsky wrote at about 18:58:51 -0400 on Tuesday, October 4, 2011:
 > After the recent thread on bad md5sum file names, I ran a check on all
 > my 1.1 million cpool files to check whether the md5sum file names are
 > correct.
 > 
 > I got a total of 71 errors out of 1.1 million files:
 > - 3 had data in it (though each file was only a few hundred bytes
 >   long)
 > 
 > - 68 of the 71 were *zero* sized when decompressed
 >       29 were 8 bytes long corresponding to zlib compression of a zero
 >       length file
 > 
 >       39 were 57 bytes long corresponding to a zero length file with an
 >       rsync checksum
 > 
 > Each such cpool file has anywhere from 2 to several thousand links
 > 
 > The 68 *zero* length files should *not* be in the pool since zero
 > length files are not pooled. So, something is really messed up here.
 > 
 > It turns out though that none of those zero-length decompressed cpool
 > files were originally zero length but somehow they were stored in the
 > pool as zero length with an md5sum that is correct for the original
 > non-zero length file.
 > 
 > Some are attrib files and some are regular files.
 > 
 > Now it seems unlikely that the files were corrupted after the backups
 > were completed since the header and trailers are correct and there is
 > no way that the filesystem would just happen to zero out the data
 > while leaving the header and trailers intact (including checksums).
 > 
 > Also, it's not the rsync checksum caching causing the problem since
 > some of the zero length files are without checksums.
 > 
 > Now the fact that the md5sum file names are correct relative to the
 > original data means that the file was originally read correctly by
 > BackupPC..
 > 
 > So it seems that for some reason the data was truncated when
 > compressing and writing the cpool/pc file but after the partial file
 > md5sum was calculated. And it seems to have happened multiple times
 > for some of these files since there are multiple pc files linked to
 > the same pool file (and before linking to a cpool file, the actual
 > content of the files are compared since the partial file md5sum is not
 > unique).
 > 
 > Also, on my latest full backup a spot check shows that the files are
 > backed up correctly to the right non-zero length cpool file which of
 > course has the same (now correct) partial file md5sum. Though as you
 > would expect, that cpool file has a _0 suffix since the earlier zero
 > length is already stored (incorrectly) as the base of the chain.
 > 
 > I am not sure what is going on with the other 3 files since I have yet
 > to find them in the pc tree (my 'find' routine is still running)
 > 
 > I will continue to investigate this but this is very strange and
 > worrying since truncated cpool files means data loss!
 > 
 > In summary, what could possibly cause BackupPC to truncate the data
 > sometime between reading the file/calculating the partial file md5sum
 > and compressing/writing the file to the cpool?
 > 

OK... this is a little weird maybe...

I looked at one file which is messed up:
     /f%2f/fusr/fshare/fFlightGear/fTimezone/fAmerica/fPort-au-Prince

On all (saved) backups, up to backup 82, the file (and the
corresponding cpool file e/f/0/ef0bd9db744f651b9640ea170b07225a) is
zero length decompressed.

My next saved backup is #110 which is non-zero length and has the
correct contents. This is true for all subsequent saved backups. The
corresponding pool file is as might be expected:
        e/f/0/ef0bd9db744f651b9640ea170b07225a_0
which makes sense since PoolWrite.pm sees that while the partial file
md5sum is the same as the root, the contents differ (since the root is
empty) so it creates a new pool file with the same stem but with index
0.

Note the original file itself was unchanged between #82 and #110.

BUT WHAT IS INTERESTING is that both pool files have the same
modification time of: 2011-04-27 03:05
which according to the logs is during the time at which backup #110
was backing up the relevant share.

I don't understand this since why would backup #110 change the mod
time of the root file which was created during an earlier backup. 

Why would PoolWrite.pm change the mod time of a pool file that is not
in the actual backup?

Could it be that this backup somehow destroyed the data in the file?
(but even so, what would cause this to happen)

Also, the XferLOG entry for both backups #82 and #110 have the line:
 pool     644       0/0         252 
usr/share/FlightGear/Timezone/America/Port-au-Prince

But this doesn't make sense since if the new pool file was created as
part of backup #110, shouldn't it say 'create' and not 'pool'?

None of this makes sense to me but somehow I suspect that herein may be a
clue to the problem...

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] Bad md5sums due to zero size (uncompressed) cpool files - WEIRD BUG