BackupPC-users

Re: [BackupPC-users] Problems with hardlink-based backups...

2009-08-18 11:38:37
Subject: Re: [BackupPC-users] Problems with hardlink-based backups...
From: Les Mikesell <lesmikesell AT gmail DOT com>
To: "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Tue, 18 Aug 2009 10:35:02 -0500
David wrote:
> 
>> You can exclude directories from the updatedb runs
> 
> Only works if the data you want to exclude (such as older snapshots)
> are kept in a relatively small number of directories, or you need to
> make a lot of exclude rules, like one for each backup. In my case,
> each backed up server/user PC/etc, is independant, and has it's own
> directory structure with snaphots, etc.
> 
> And actually backuppc also has a problematic layout for locate rules:
> 
> __TOPDIR__/pc/$host/nnn <- One of those directories for each backup version.
> 
> So basically, if you have a large number of files on a server, it
> seems like you need to entirely exclude the server from updatedb,
> otherwise the snapshot directories are going to cause a huge updatedb
> database.
> 
> Which kind of defeats the point of having updatedb running on the
> backup server. Which is why I've disabled it here :-(.

Why not just exclude the _TOPDIR_ - or the mount point if this is on its 
own filesystem?

>> Backuppc maintains its own status showing how much space the pool uses and 
>> how
>> much is left on the filesystem. So you just look at that page often enough to
>> not run out of space.
> 
> Sounds like a 'df'- like display on the web page, but for the backuppc
> pool rather than a partition.

It keeps both a summary of pool usage (current and yesterday) and totals 
for each backup run of number of files broken down by new and existing 
files in the pool and the size before and after compression.  A glance 
at the pool percent usage and daily change tells you where you stand.

> Please correct me if I'm mistaken, but that doesn't really help people
> who want to find which files and dirs are taking up the most space, so
> they can address it (like, tweak the number of backed up generations,
> or exclude additional directories/file patterns, etc).

There's not a good way to figure out which files might be in all of your 
backups and thus not help space-wise when you remove any instance(s) of 
it.  But the per-host, per-run stats where you can see the rate of new 
files being picked up and how much they compress is very helpful.

> Normally people use a tool like 'du' for that, but 'du' itself is next
> to unusable when you have a massive filesystem, which can easily be
> created by hardlink snapshot-based backup systems :-(

That's probably why backuppc does it internally - that and keeping track 
of compression stats and which files are new.

>> It is best done pro-actively, avoiding the problem instead of trying to fix 
>> it
>> afterwards because with everything linked, it doesn't help to remove old
>> generations of files that still exist.  So generating the stats daily and
>> observing them (both human and your program) before starting the next run is 
>> the
>> way to go.
>>
> 
> 1. Removing old generations does help. The idea is to remove old
> "churn" that took place in that version. In other words, files which
> no longer have any references after that generation is removed
> (because all previous generations referring to those files via hard
> links, are also gone by this point).

Of course, but you do it by starting with a smaller number of runs than 
you expect to be able to hold.  Then after you see that the space 
consumed is staying stable you can adjust the amount of history to keep.

> 2. Proactive is good, but again, with a massive directory structure,
> it's hard to use tools like du to check which backups you need to
> finetune/prune/etc.

This may well be a problem with whatever method you use.  It is handled 
reasonable well in backuppc.

>> Also, you really want your backup archive on its own mounted filesystem so it
>> doesn't compete with anything else for space and to give you the possibility 
>> of
>> doing an image copy if you need a backup since other methods will be too 
>> slow to
>> be practical.  And 'df' will tell you what you need to know about a 
>> filesystem
>> fairly quickly.
>>
> 
> Our backups are stored under a LVM which is used only for backups. But
> again, the problem is not disk usage causing issues for other
> processes. The problem is, once the allocated area is running out of
> space, how to check *where* that space is going to, so you can take
> informed action. 'df' is only going to tell you that you're low on
> space, not where the space is going.

One other thing - backuppc only builds a complete tree of links for full 
backups which by default run once a week with incrementals done on the 
other days.  Incremental runs build a tree of directories but only the 
new and changed files are populated, with a notation for deletions.  The 
web browser and restore processes merge the backing full on the fly and 
the expire process knows not to remove fulls until the incrementals that 
depend on it have expired as well.  That, and the file compression might 
take care of most of your problems.

-- 
    Les Mikesell
     lesmikesell AT gmail DOT com



------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/