BackupPC-users

Re: [BackupPC-users] Problems with hardlink-based backups...

2009-08-17 09:09:35
Subject: Re: [BackupPC-users] Problems with hardlink-based backups...
From: Les Mikesell <lesmikesell AT gmail DOT com>
To: "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Mon, 17 Aug 2009 08:05:54 -0500
David wrote:
> 
> Where the real problem comes in, is if admins want to use 'updatedb',
> or 'du' on the linux system. updatedb gets a *huge* database and uses
> up tonnes of cpu & ram  (so, I usually disable it). And 'du' can take
> days to run, and make multi-gb files.

You can exclude directories from the updatedb runs.  Du doesn't make any files 
unless you redirect its output - and it can be constrained to the relevant top 
level directories with the -s option.

> Here's a question for backuppc users (and people who use hardlink
> snapshot-based backups in general)... when your backup server, that
> has millions of hardlinks on it, is running low on space, how do you
> correct this?

Backuppc maintains its own status showing how much space the pool uses and how 
much is left on the filesystem. So you just look at that page often enough to 
not run out of space.

> The most obvious thing is to find which host's backups are taking up
> the most space, and then remove some of the older generations.
> 
> Normally the simplest method to do this, is to run a tool like 'du',
> and then perhaps view the output in xdiskusage. (One interesting thing
> about 'du', is that it's clever about hardlinks, so doesn't count the
> disk usage twice. I think it must keep a table in memory of visited
> inodes, which had a link count of 2 or greater).
> 
> However, with a gazillion hardlinks, du takes forever to run, and has
> a massive output. In my case, about 3-4 days, and about 4-5 GB output
> file.
> 
> My current setup is a basic hardlink snapshot-based backup scheme, but
> backuppc (due to it's pool structure, where hosts have generations of
> hardlink snapshot dirs) would have the same problems.
> 
> How do people solve the above problem?

Backuppc won't start a backup run if the disk is more than 95% (configurable) 
full.


> (I also imagine that running "du" to check disk usage of backuppc data
> is also complicated by the backuppc pool, but at least you can exclude
> the pool from the "du" scan to get more usable results).
> 
> My current fix is an ugly hack, where I go through my snapshot backup
> generations (from oldest to newest), and remove all redundant hard
> links (ie, they point to the same inodes as the same hardlink in the
> next-most-recent generation). Then that info goes into a compressed
> text file that could be restored from later. And after that, compare
> the next 2-most-recent generations and so on.
> 
> But yeah, that's a very ugly hack... I want to do it better and not
> re-invent the wheel. I'm sure this kind of problem has been solved
> before.

It is best done pro-actively, avoiding the problem instead of trying to fix it 
afterwards because with everything linked, it doesn't help to remove old 
generations of files that still exist.  So generating the stats daily and 
observing them (both human and your program) before starting the next run is 
the 
way to go.

> fwiw, I was using rdiff-backup before. It's very du-friendly, since
> only the differences between each backup generation is stored (rather
> than a large number of hardlinks). But I had to stop using it, because
> with servers with a huge number of files it uses up a huge amount of
> memory + cpu, and takes a really long time. And the mailing list
> wasn't very helpful with trying to fix this, so I had to change to
> something new so that I could keep running backups (with history).
> That's when I changed over to a hardlink snapshots approach, but that
> has other problems, detailed above. And my current hack (removing all
> redundant hardlinks and empty dir structures) is kind of similar to
> rdiff-backup, but coming from another direction.

Also, you really want your backup archive on its own mounted filesystem so it 
doesn't compete with anything else for space and to give you the possibility of 
doing an image copy if you need a backup since other methods will be too slow 
to 
be practical.  And 'df' will tell you what you need to know about a filesystem 
fairly quickly.

-- 
   Les Mikesell
     lesmikesell AT gmail DOT com


------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/