BackupPC-users

Re: [BackupPC-users] BackupPC Pool synchronization?

2013-03-01 16:44:21
Subject: Re: [BackupPC-users] BackupPC Pool synchronization?
From: Mark Campbell <mcampbell AT emediatrade DOT com>
To: "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Fri, 1 Mar 2013 14:26:35 -0700
I find myself rather surprised that this is a major issue in what is otherwise 
a really good enterprise-level backup tool.  Syncronizing backups just seems to 
be a basic element to the idea of backups in a corporate environment.  Should 
the building that my backup server resides in burns down, gets hit by a 
tornado, etc, there should be a process whereby you can have a syncronized 
backup elsewhere.  Also by extension, what happens when you want to have a 
"cluster" of BackupPC?  

The idea that you just run two BackupPC servers each running their own backups 
may work in some cases, but you are talking about double the transfers on the 
machines being backed up, and that can be unacceptable in some cases.  For 
example, one of my machines being backed up is a linux server acting as a 
network drive.  Backups of this can take a long time, BackupPC tells me 514 
minutes for it's last full backup (naturally, this occurs after business 
hours).  Once its been backed up, it's been deduped & compressed.  It would 
ideally be better, even on a LAN, to transfer this compressed & deduped pool 
than it would to back it up twice on the same day.  In the case of my network 
drive, worst case it gets bogged down 8hrs a day for backup.  I have a small 
space of time that is considered "off hours" for it.  My backup server on the 
other hand, can be bogged down 24 hrs/day for all I care, no one else is using 
its services but me.

Jeffrey, what is your latest version of your script?  I have 0.1.3, circa Sept 
'11.  Given how your script generally works, could it be made to simply 
recreate the pool structure on an external drive on the same system, rather 
than compressing it to a tarball?  My end goal here is to be able to simply 
grab the external drive at a moment's notice, plug it into a new linux machine, 
and using a tarball of the BackupPC config files, and stand it up long enough 
to restore everyone's PCs & appropriate servers.

Greg, I would definitely have an interest in seeing the script; anything that 
will help me achieve a tertiary remote backup...

Thanks,

--Mark


-----Original Message-----
From: backuppc AT kosowsky DOT org [mailto:backuppc AT kosowsky DOT org] 
Sent: Thursday, February 28, 2013 9:43 PM
To: General list for user discussion, questions and support
Subject: Re: [BackupPC-users] BackupPC Pool synchronization?

Mark Campbell wrote at about 14:10:13 -0700 on Thursday, February 28, 2013:
 > So I'm trying to get a BackupPC pool synced on a daily basis from a 1TB MD 
 > RAID1 array to an external Fireproof drive (with plans to also sync to a 
 > remote server at our collo).  I found the script BackupPC_CopyPcPool.pl by 
 > Jeffrey, but the syntax and the few examples I've seen online have indicated 
 > to me that this isn't quite what I'm looking for, since it appears to output 
 > it to a different layout.  I initially tried the rsync method with -H, but 
 > my server would end up choking at 350GB.  Any suggestions on how to do this?

The bottom line is that other than doing a block level file system copy there 
is no "free lunch" that gets around the hard problem of copying over densely 
hard-linked files.

As many like yourself have noted, rsync bogs down using the -H (hard
links) flag, in part because rsync knows nothing of the special structure of 
the pool & pc trees so it has to keep full track of all possible hard links.

One solution is BackupPC_tarPCCopy which uses a tar-like perl script to track 
and copy over the structure.

My script BackupPC_copyPcPool tries to combine the best of both worlds. It 
allows you to use rsync or even "cp -r" to copy over the pool disregarding any 
hard links. The pc tree with its links to the pool are re-created by creating a 
flat file listing all the links, directories, and zero size files that comprise 
the pc tree. This is done with the help of a hash that caches the inode number 
of each pool entry. The pc tree is then recreated by sequentially (re)creating 
directories, zero size files, and links to the pool.

I have substantially re-written my original script to make it orders of 
magnitude faster by substituting a packed in-memory hash for the file-system 
inode-tree I used in the previous version. Several other improvements have been 
added, including the ability to record full file md5sums and to fix 
broken/missing links.

I was able to copy over a BackupPC tree consisting of 1.3 million pool files 
(180 GB)  and 24 million pc tree entries (4 million directories, 20 million 
links, 300 thousand zero-length files) in the following time:

~4 hours to copy over the pool
~5 hours to create the flat file mapping out the pc tree directories,
  hard links & zero length files
~7 hours to convert the flat file into a new pc tree on the target filesystem

These numbers are approximate since I didn't really time it. But it was all 
done on a low end AMD dual-core laptop with a single USB3 drive.

For this case, the flat file of links/directories/zero length files is 660 MB 
compress (about 3.5 GB uncompressed). The inode caching requires about 250MB of 
RAM (mostly due to perl overhead) for the 1.3 million pool files. 

Note, before I release the revised script, I also hope to add a feature that 
allows the copying of one or more backups from the pc tree on one machine to 
the pc tree on another machine (with a different pool). This feature is not 
available on any other backup scheme... and effectively will allow 
"incremental-like" backups.

I also plan to allow the option to more tightly pack the inode caching to save 
memory at the expense of some speed. I should be able to fit
10 million pool nodes in a 300MB cache.

I would like to benchmark my revised routine against BackupPC_tarPCCopy in 
terms of speed, memory requirement, and generated file size...

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics Download AppDynamics Lite for free 
today:
http://p.sf.net/sfu/appdyn_d2d_feb
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/