BackupPC-users

Re: [BackupPC-users] Duplicate files in pool with same CHECKSUM and same CONTENTS

2008-10-30 07:14:22
Subject: Re: [BackupPC-users] Duplicate files in pool with same CHECKSUM and same CONTENTS
From: Holger Parplies <wbppc AT parplies DOT de>
To: Tino Schwarze <backuppc.lists AT tisc DOT de>
Date: Thu, 30 Oct 2008 12:11:43 +0100
Hi,

Tino Schwarze wrote on 2008-10-30 11:13:27 +0100 [Re: [BackupPC-users] 
Duplicate files in pool with same CHECKSUM and same CONTENTS]:
> [...]
> Hm. I just took a look in my cpool and found some files which didn't
> hit the hardlink count yet, but have a _0 and _1:
> .../cpool/0/0 # ls -l c/00cd83be1ea3c1ffa3c6af2f4e310206* 
> -rw-r----- 4371 backuppc users 34 2005-01-14 17:01 
> c/00cd83be1ea3c1ffa3c6af2f4e310206 
> -rw-r----- 3536 backuppc users 34 2005-03-02 02:22 
> c/00cd83be1ea3c1ffa3c6af2f4e310206_0 
> -rw-r-----  439 backuppc users 34 2006-03-11 02:04 
> c/00cd83be1ea3c1ffa3c6af2f4e310206_1 
> 
> MD5Sums are not equal for all files,

that's intentional :-). Those files have different content but hash to the
same BackupPC hash. Quoting you:

> If you look at BackupPC's status page, there is a line like:
> 
> * Pool hashing gives 649 repeated files with longest chain 28, 

That is what this line is about - you have up to 28 different files hashing to
the same BackupPC hash (some of these may coincidentally have identical
content due to link count overflows, but that would be the exception).

> AFAIK, I started with $Conf{HardLinkMax} set to 32.000. As the files are
> very old, a lot of links might have expired already.

True, but keep in mind how much 32000 really is. Unless you have many files
with identical content in your backup set (CVS/Root maybe), it will take very
many backups to reach so many links.

> I'm not sure though, how the file name is derived,

It's in the docs. Up to 256 KB of file contents (from the first 1 MB) and the
file length are taken into account, so it's quite easy to produce hash clashes
if you want to: take a file > 1 MB and change the last byte. BackupPC resolves
them and they're probably infrequent enough not to be a problem (and you get
to see whether they are on the status page). Taking the length (of the
uncompressed file) into account avoids things like growing logfiles from
causing problems.

> IIRC, BackupPC_nightly should perform chain cleaning.

Unused files (i.e. link count = 1) are removed and chains renumbered. Like I
wrote, relinking identical files does not make sense.

Regards,
Holger

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/