BackupPC-users

Re: [BackupPC-users] Problems with hardlink-based backups...

2009-08-19 10:05:20
Subject: Re: [BackupPC-users] Problems with hardlink-based backups...
From: Les Mikesell <lesmikesell AT gmail DOT com>
To: "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Wed, 19 Aug 2009 08:40:12 -0500
David wrote:
> 
> I haven't actually used BackupPC yet, mainly read through it's docs,
> and trying to judge how well it and it's storage system would work in
> our environment.

Why not set up a test machine?  It is trivial to install, especially if you use 
the ubuntu package or the one from the epel repository on RHEL or Centos.

> Also, I'm not too experienced with backup "best practices",
> methodologies, etc. Still learning, and seeing what works best.

Backuppc is very configurable (browse though the docs and note all the settings 
you can change) but the defaults are pretty good so you can get reasonable 
results without changing much.

>> Why not just exclude the _TOPDIR_ - or the mount point if this is on its
>> own filesystem?
>>
> 
> Because most of the interesting files on the backup server (at least
> in my case), are the files being backed up. I'm a lot more interested
> in being able to quickly find those files, than random stuff under
> /etc, /usr, etc.

Backuppc provides a web interface for easy browsing, so if you know where 
something was on the original target you can find it easily.  It does mangle 
the 
filenames and compress the contents so it is harder - but not impossible to 
work 
directly with the filesystem.  Where it is appropriate, you can assign 'owners' 
of the target hosts so they can control and access them directly so you don't 
have to be involved.

> 1. How well does BackupPC work when you manually make changes to the
> pool behind it's back? (like removing a host, or some of the host's
> history, via the command line). Can you make it "resync/repair" it's
> database?

Forcing a 'full' run will fix about anything.  There are some tricks to keep 
the 
stats right - and I think someone on the list has a script to do things 
cleanly. 
  But, drastic measures like that are rarely necessary because you can control 
expiration on a per-host basis and normally it takes care of itself.

> 2) Is there a recommended approache for "backing up" BackupPC databases?
> 
> In case they go corrupt and so on. Or is a simple rsync safe?

This is a big issue.  Up to a certain size (depending mostly on  the number of 
files and amount of RAM you have), rsync -H will work, but there are limits. 
Image copies of the partition will always work.  Personally I like to keep the 
archive small enough to fit on a single disk (so 2 TB or less these days) and 
raid-mirror to a swappable drive.

> 3) Is it possible to use BackupPC's logic on the command-line, with a
> bunch of command-line arguments, without setting up config files?

It does have command line tools.  But they are less convenient than letting the 
system work as designed.

> That would be awesome for scripting and so on, for people who want to
> use just parts of it's logic (like the pooled system for instance),
> rather than the entire backup system. I tend to prefer that kind of
> "unix tool" design.

It's all in perl.  If you want to change something you might as well do it in 
the base script...

> Ah right. I think this is a fundamental difference in approach. With
> the backup systems I've used before, space usage is going to keep
> growing forever, until you take steps to fix it. Either manually, or
> by some kind of scripting, and so far I haven't added scripting, so I
> rely on du to know where to manually recover space.

Expiration is designed in and tunable - per host.

> And, if you have a lot of harddrive space on the backup server, then
> may as well actually make use of it, to store as many versions as
> possible. And then only remove oldest versions where needed.
> 
> The above backup philosophy (based partly on rdiff-backup limitations)
> has served me well so far, but I guess I need to unlearn some of it,
> particularly if I want to use a hardlink-based backup system.

There is also an 'archive host' concept to generate a fairly standard tar 
archive out of the backup for one or more of your targets - or you can do it 
with the command line tool.  For really long term storage that is a better 
approach since you can restore it without any special programs - but you lose 
the space-sharing storage.

> AA couple of questions, pardon my noobiness:
> 
> If rsync is used, then what is the difference between an incremental
> and a full backup?
> 
> ie, do  "full" backups copy all the data over (if using rsync), or
> just the changed files?

Fulls add the --ignore-times option to the run and re-reads everything on the 
target for a block-checksum comparison in addition to rebuilding the backup 
tree 
completely.

> And, what kind of disadvantage is there if you only do (rsync-based)
> incrementals and don't ever make full backups?

Unless you do incremental 'levels', each incremental is based on the previous 
full so you end up copying more and more each run.

> My angle is that Linux sysadmins have certain tools they like to use,
> and saying they can't use them effectively due to the backup
> architecture is kind of problematic.

You get over that quickly when you have a system that takes care of itself.

> 2) Stop trying to keep history for every single day for years (rather
> keep 1 for the last X days, last Y weeks, Z months, etc).

You can do an 'exponential' series to keep some old copies but they get farther 
apart as they get older.  but it is better to get the things that need to be 
kept forever into some sort of version control system so backing up the current 
version of its repository lets you reconstruct the past.  Then let the rest 
expire.

> And also it bothers me that those kind of stats can potentially go out
> of synch with the harddrive (maybe you delete part of the pool by
> mistake).
> 
> Is there a way to make BackupPC "repair" it's database, by re-scanning
> it's pool? Or some kind of recommended procedure for fixing problems
> like this?

I think this happens nightly.

> PS: Random question: Does backuppc have tools for making offsite,
> offline backups? Like copying a subset of the recent BackupPC backups
> over to a set of external drives (in encrypted format) and then taking
> the drives home or something like that.
> 
> Or alternately, are there recommended tools for this? I made a script
> for this, but want to see how people here usually handle this.

Image copies always work, rsync sometimes works.  Even better is to just run 
another independent instance remotely and let it take care of itself.

-- 
   Les Mikesell
    lesmikesell AT gmail DOT com



------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

<Prev in Thread] Current Thread [Next in Thread>