BackupPC-users

Re: [BackupPC-users] Disk space used far higher than reported pool size

2013-11-01 14:01:29
Subject: Re: [BackupPC-users] Disk space used far higher than reported pool size
From: Holger Parplies <wbppc AT parplies DOT de>
To: "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Fri, 1 Nov 2013 18:57:05 +0100
Hi,

I get some diagnostics when reading this with 'use warnings "wrong_numbers"' ...

backuppc AT kosowsky DOT org wrote on 2013-11-01 12:18:17 -0400 [Re: 
[BackupPC-users] Disk space used far higher than reported pool?size]:
> Craig O'Brien wrote at about 10:11:07 -0400 on Friday, November 1, 2013:
>  > >And this would explain why the elements are not being linked properly to
>  > the pool -- though I would have thought the more likely result would be a
>  > duplicate pool entry than an unlinked pool entry...
>  > 
>  > >It might be interesting to look for pool chains with the same 
> (uncompressed)
>  > content and with links < HardLinkMax (typically 31999) to see if pool

this one looks correct. 31999. Unless of course you've changed it in config.pl
because your FS requirements differ.

>  > entries are being unnecessarily duplicated.
>  > 
>  > >Try: (cd /var/lib/BackupPC/cpool; find . -type f -links -3198 -name "*_*" 
> -exec

This one doesn't.

>  > md5sum {} \;) | sort | uniq -d -w32
>  > 
>  > > Note this will find if there are any unnecessarily duplicated pool chains
>  > (beyond the base one). Note to keep it fast and simple I am
>  > > skipping the elements without a suffix... with the assumption being that
>  > if there are duplicated elements then there will probably be
>  > > whole chains of them...
>  > 
> 
> I added some more bash-foo so that the following should find *any* and *all*
> unnecessary pool dups...
> 
> (cd /var/lib/BackupPC/cpool; find . -name "*_0" | sed "s/_0$//" | (IFS=$'\n'; 
> while read FILE; do find "${FILE}"* -links -3199 -exec md5sum {} \; | sort | 
> uniq -D -w32 ; done))

Nor does this one (the 3199 again). While it will find chain members with
less links than apparently necessary, it won't find all of them - only those
with *far* too small link number. That might be sufficient, depending on what
we're looking for. You probably wouldn't have chosen the (arbitrary) value
"3199", though, if you hadn't in fact meant "31999" ;-). And you wouldn't be
saying "*any* and *all*" if you were meaning "some".

I'd like to point out three things:
1.) unnecessary duplication *within* the pool is not the problem we are
    looking for,
2.) if it were a problem, then because a duplicate was created way ahead of
    time and repeatedly, not because the overflow happens at 31950 instead of
    31999,
3.) finding "unnecessary duplicates" can have a normal explanation: if at some
    point you had more than 31999 copies of one file (content) in your
    backups, BackupPC would have created a pool duplicate. Some of the backups
    linking to the first copy would have expired over time, leaving behind a
    link count < 31999. Further rsync backups would tend to link to the second
    copy, at least for unchanging existing files (in full backups). In other
    cases, the first copy might be reused, but there's no guarantee the link
    count would be exactly 31999 (though it would probably tend to be).
    Having so many copies of identical file content in your backups would tend
    to happen for small files rather than huge ones, I would expect, and it
    doesn't seem to be very common anyway (in my pools, I find exactly one file
    with a link count of 60673 (XFS) and a total of five with more than 10000
    links, the largest having 103 bytes (compressed)).

Regards,
Holger

------------------------------------------------------------------------------
Android is increasing in popularity, but the open development platform that
developers love is also attractive to malware creators. Download this white
paper to learn more about secure code signing practices that can help keep
Android apps secure.
http://pubads.g.doubleclick.net/gampad/clk?id=65839951&iu=/4140/ostg.clktrk
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/