BackupPC-users

Re: [BackupPC-users] Backing up BackupPC

2016-07-12 13:08:07
Subject: Re: [BackupPC-users] Backing up BackupPC
From: Kris Lou <klou AT themusiclink DOT net>
To: "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Tue, 12 Jul 2016 10:06:39 -0700
The problem that you have to deal with is BackupPC's reliance on hard links within the pool -- and this is supposedly dealt with in 4.x (?).

To resurrect some old discussions (https://sourceforge.net/p/backuppc/mailman/message/27105491/):
 
> * For most people, rsync does not work to replicate a backup server
> effectively. Period. I think *no* one would suggest this as a reliable
> ongoing method of replicating a BackupPC server. Ever.
>
> * The best methods for this boil down to two camps:
> 1) Run two BackupPC servers and have both back up the hosts
> directly
> No replication at all: it just works.
> 2) Use some sort of block-based method of replicating the data
>
> * Block-based replication boils down to two methods
> 1) Use md or dm to create a RAID-1 array and rotate members of
> this array in and out
> 2) Use LVM to create snapshots of partitions and dd the partition
> to a different drive
> (I guess 3) Stop BackupPC long enough to do a dd of the partition
> *without* lVM)
>
I think there is a 3rd camp:
3. Scripts that understand the special structure of the pool and pc
trees and efficiently create lists of all hard links in pc
directory.
a] BackupPC_tarPCCOPY
Included in standard BackupPC installations. It uses a perl
script to recurse through the pc directory, calculate (and
cache if you have enough memory) the file name md5sums and
then uses that to create a tar-formatted file of the hard
links that need to be created. This routine has been
well-tested at least on smaller systems.
b] BackupPC_copyPcPool
Perl script that I recently wrote that should be significantly
faster than [a], particularly on machines with low memory
and/or slower cpus. This script creates a new temporary
inode-number indexed pool to allow direct lookup of links and
avoid having to calculate and check file name md5sums. The
pool is then rsynced (without hard links -- i.e. no -H flag)
and then the restore script is run to recreate the hard
links. I recently used this to successfully copy over a pool of
almost 1 million files and a pc tree of about 10 million files.
See the recent archives to retrieve a copy.


Some tape backup systems aren't smart about hard links
If you backup the BackupPC pool to tape you need to make sure that the tape backup system is smart about hard links. For example, if you simply try to tar the BackupPC pool to tape you will backup a lot more data than is necessary.
Using the example at the start of the installation section, 65 hosts are backed up with each full backup averaging 3.2GB. Storing one full backup and two incremental backups per laptop is around 240GB of raw data. But because of the pooling of identical files, only 87GB is used (with compression the total is lower). If you run du or tar on the data directory, there will appear to be 240GB of data, plus the size of the pool (around 87GB), or 327GB total.
If your tape backup system is not smart about hard links an alternative is to periodically backup just the last successful backup for each host to tape. Another alternative is to do a low-level dump of the pool file system (ie: /dev/hda1 or similar) using dump(1).
Supporting more efficient tape backup is an area for further development.

I think that this answers your questions about tape requirements if directly tar'ing the pool.  You might be better off just scheduling host archives periodically.

I don't know if Jeffrey Kosowsky still monitors the list, but somebody might have a copy of his scripts (3b, above).  Unfortunately, these were part of the original BackupPC Wiki, which is no longer available.


------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports.http://sdm.link/zohodev2dev
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/