BackupPC-users

Re: [BackupPC-users] Digest::MD5 vs md5sum

2015-09-16 17:06:12
Subject: Re: [BackupPC-users] Digest::MD5 vs md5sum
From: gregrwm <backuppc-users AT whitleymott DOT net>
To: backuppc-users AT lists.sourceforge DOT net
Date: Wed, 16 Sep 2015 16:03:52 -0500
Where did you get the idea to 4b $s? You might try echo -n ning it instead...  Hope that helps. Not sure what you are actually trying to do, though.

in order to copy my backups through a narrow pipe i just selected all the most recent full backups and sent them via rsync.  later it became expedient to make the new location the backuppc host, so i then wanted to (re)create cpool on the new backuppc host.  unfortunately BackupPC_fixLinks.pl wasn't quite working.  it was presuming that any multiply linked file was already in the pool.  well my rsync copy preserved quite a lot of hardlinks, so that clearly wasn't true for me.  i did eventually work out a simple fix for BackupPC_fixLinks.pl and will be glad to share it if anyone's interested.

meanwhile i was looking at whipping up NewFileList files so i could use BackupPC_link, hence i was trying to work out how to recompute the custom backuppc md5, using this post:

Re: [BackupPC-devel] Hash (MD4?) Algorithm used for Pool
From: Craig Barratt <cbarratt@us...> - 2005-08-19 09:23:30

Roy Keene writes:
>      Can you describe what is hashed and using which algorithm is used
> to determine the pool hash name ?

Sorry about the delay in replying - I'm on vacation this week.

It's a little arcane, but here it is.  The MD5 digest is used
on the following data:

   - for files <= 256K we use the file size and the whole file, ie:
       
        MD5([4 byte file size, file contents])

   - for files <= 1M we use the file size, the first 128K and
     the last 128K.
   - for files > 1M, we use the file size, the first 128K and
     the 8th 128K (ie: the 128K up to 1MB)...

One thing that is not clear is what perl does when the fileSize
is bigger than 4GB.  In particular, we start off with:

    $md5->add($fileSize);

I suspect that this will be the real file size modulo 2^32 (ie: the
lower 4 bytes of the file size).

so that led me to presume i should represent the size as 4 bytes containing 32bits of binary.  be that as it may, you were correct, it works with the size as a decimal string, neither padded nor truncated.  i verified it with files of various sizes:

>$ alias zcat=/usr/share/?ackup??/bin/BackupPC_zcat
>$ alias li=ls' -aFqldi --color --time-style=+%F" %a "%T'
>$ um()(s=$( cat $1|wc -c) m=$((echo -n $s; cat $1|sf)|md5sum);echo $(li $1) $m s=$s)  #li & md5 of uncompressed argfile
>$ zm()(s=$(zcat $1|wc -c) m=$((echo -n $s;zcat $1|sf)|md5sum);echo $(li $1) $m s=$s)  #li & md5 of  compressed argfile
>$ sf()if [ $s -le $((256*1024)) ];then cat                                            #select filedata for md5
>>   else head -c1M|(head -c128K;tail -c128K)
>>   fi
>$ um /etc/papersize
>349898 -rw-r--r-- 1 root root 3 2014-01-01 Wed 17:39:09 /etc/papersize 7a59e82651106239413a38eb30735991 - s=3
>$ zm cpool/7/a/5/7a59e82651106239413a38eb30735991
>19720838 -rw-r----- 41 backuppc backuppc 11 2015-04-23 Thu 02:30:49 cpool/7/a/5/7a59e82651106239413a38eb30735991 7a59e82651106239413a38eb30735991 - s=3
>$ zm cpool/4/6/0/46052bcabfe39626ccbcee2b709ce1a8
>7620687 -rw-r----- 2 backuppc backuppc 1287 2014-01-04 Sat 03:15:17 cpool/4/6/0/46052bcabfe39626ccbcee2b709ce1a8 46052bcabfe39626ccbcee2b709ce1a8 - s=1276
>$ zm cpool/4/e/2/4e2ae5ba88b4a31a9995b6d7bbee9ca6
>22095546 -rw-r----- 4 backuppc backuppc 7615 2008-04-05 Sat 19:10:45 cpool/4/e/2/4e2ae5ba88b4a31a9995b6d7bbee9ca6 4e2ae5ba88b4a31a9995b6d7bbee9ca6 - s=43116
>$ zm cpool/4/6/0/4603834cfc3e7323a83c54d542d191a8
>5738140 -rw-r----- 19 backuppc backuppc 126642 2015-04-23 Thu 02:06:36 cpool/4/6/0/4603834cfc3e7323a83c54d542d191a8 4603834cfc3e7323a83c54d542d191a8 - s=641020
>$ zm cpool/4/e/2/4e226deede40ab1199eb2ebdbf220995
>7825485 -rw-r----- 3 backuppc backuppc 202772654 2010-01-22 Fri 19:31:57 cpool/4/e/2/4e226deede40ab1199eb2ebdbf220995 4e226deede40ab1199eb2ebdbf220995 - s=204085248
------------------------------------------------------------------------------
Monitor Your Dynamic Infrastructure at Any Scale With Datadog!
Get real-time metrics from all of your servers, apps and tools
in one place.
SourceForge users - Click here to start your Free Trial of Datadog now!
http://pubads.g.doubleclick.net/gampad/clk?id=241902991&iu=/4140
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/
<Prev in Thread] Current Thread [Next in Thread>