BackupPC-users

Re: [BackupPC-users] how can I find out how much space a given host uses?

2009-12-01 19:13:55
Subject: Re: [BackupPC-users] how can I find out how much space a given host uses?
From: Pieter Wuille <sipa AT users.sourceforge DOT net>
To: "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Wed, 2 Dec 2009 01:11:27 +0100
On Tue, Dec 01, 2009 at 09:28:50AM -0500, Jeffrey J. Kosowsky wrote:
> Pieter Wuille wrote at about 13:18:33 +0100 on Tuesday, December 1, 2009:
>  > What you can do is count the allocated space for each directory and file, 
> but
>  > divide the numbers for files by (nHardlinks+1). This way you end up
>  > distributing the size each file takes on disks over the different backups 
> it
>  > belongs to.
>  > 
>  > I have a script that does this; if there's interest i'll attach it. It does
>  > take a day (wild guess, never accurately measured) to go over all pc/*
>  > directories (Pool is 370.65GB comprising 4237093 files and 4369
>  > directories)
> 
> I am surprised that it would take a day.
The server is quite busy making backups, and rsync'ing to an offsite backup
server at the same time -- especially the latter puts some serious load on 
I/O, i assume.

> The only real cost should be that of doing a 'find' and a 'stat' on
> the pc tree - which I would do in perl so that I could do the
> arithmetic in place (rather than having to use a *nix find -printf to
> pass it off to another program).
Yes, it is a perl script.

> Unless you have a huge number of pc's and backups, I can't imagine
> this would take more than a couple of hours since your total number of
> unique files in only about 4 million.
We have 4 million unique inodes. We do however have some 20-25 million
directory entries, which is what the script needs to read through.

> Given that you only have 4 million unique files, you could even avoid
> the multiple stats at the cost of that much memory by caching the
> nlinks and size by inode number.
Except that the script already needs to do a stat per directory entry in order
to know the inode number itself...

> Can you post your script?

See attachment. You can run eg.:

   ./diffsize.pl /var/lib/backuppc/pc/*

to see values per host, and a total.

PS: it actually (correctly) divides by (nHardLinks-1) instead of +1 (what i
claimed earlier).

kind regards,

-- 
Pieter

Attachment: diffsize.pl
Description: Text Data

------------------------------------------------------------------------------
Join us December 9, 2009 for the Red Hat Virtual Experience,
a free event focused on virtualization and cloud computing. 
Attend in-depth sessions from your desk. Your couch. Anywhere.
http://p.sf.net/sfu/redhat-sfdev2dev
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/