BackupPC-users

Re: [BackupPC-users] Using rsync for blockdevice-level synchronisation of BackupPC pools

2009-09-02 08:46:46
Subject: Re: [BackupPC-users] Using rsync for blockdevice-level synchronisation of BackupPC pools
From: Jon Craig <cannedspam.cant AT gmail DOT com>
To: "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Wed, 2 Sep 2009 08:43:11 -0400
On Wed, Sep 2, 2009 at 7:56 AM, Daniel
Berteaud<daniel AT firewall-services DOT com> wrote:
> Le mercredi 02 septembre 2009 à 12:10 +0200, Pieter Wuille a écrit :
>> Hello everyone,
>>
>> trying to come up with a way for efficiently synchronising a BackupPC archive
>> on one server with a remote and encrypted offsite backup, the following 
>> problems
>> arise:
>> * As often pointed out on this list, filesystem-level synchronisation is
>>   extremely cpu and memory-intensive. Not actually impossible, but depending
>>   on the scale of your backups, it is maybe not a practical solution.
>>   In our case of a 350GiB pool containing 4 million directories and 20 
>> miilion
>>   inodes, simply locally copying the whole pool using
>>   cp/rsync/xfsdump/whatever thrashes, gets killed by OOM or at least takes
>>   days, longer than i find reasonable for a remote synchronisation run.
>> * Furthermore, we want our offsite backup to be encrypted - in the ideal case
>>   using a secret key that is at no single moment ever known at the remote
>>   location - there should only be encrypted files sent to it, and stored 
>> there.
>>   Doing this encryption at the file level given such massive amount of small
>>   files, is a very serious additional overhead.
>> * The alternative to file-level synchronisation is (block)device-level
>>   synchronisation. Many possibilities exist here, including ZFS send/receive
>>   (if you use ZFS), using snapshots (eg. LVM) or temporarily stopping 
>> backups,
>>   and do a full copy of the pool to the remote side (if you have enough
>>   bandwidth), etc... Not everyone is willing to use these, or is prepared to
>>   convert to such a system.
>> * We would like to use Rsync for this, since it will skip identical parts, 
>> yet
>>   guarantee that the whole file is byte-per-byte identical to the original.
>>   Unfortunately, as far as I know, rsync doesn't support data on block 
>> devices
>>   to be synced, only the block device node itself. In addition to that, rsync
>>   needs to read and process the whole file on the receiver side, calculate
>>   checksums, send them all to the sender side, wait for the sender to
>>   reconstruct the data using the checksums, send this reconstruction, and
>>   apply this reconstruction at the receiver side. This requires at least the
>>   sum of the times to read through the whole data on both sides if it is a
>>   single file (correct me if i'm wrong, i don't know rsync internals). Data
>>   hardly moves on-disk in the case of a BackupPC pool, so we would like to
>>   disable or at least limit the range in which rsync searches for matching 
>> data.
>>
>> To overcome this issue, i wrote a perl/fuse filesystem that allows you to
>> "mount" a block device (or real file) as a directory containing files
>> part0001.img, part0002.img, ... each representing 1 GiB of data of the
>> original device:
>>
>>   https://svn.ulyssis.org/repos/sipa/backuppc-fuse/devfiles.pl
>>
>> This directory can be rsynced in a normal way with an "ordinary" directory
>> on an offsite backup. In case a restore is necessary, doing
>> 'ssh remote "cat /backup/part*.img" >/dev/sdXY' (or equivalent) suffices.
>> Although devfiles.pl has (limited) write support, rsync'ing to the resulting
>> directory is not yet possible - maybe i can try to have this working if
>> people have a need for it. This would allow restoration by simply rsync'ing
>> in the opposite direction.
>> Doing the synchronisation in groups of 1GiB prevents rsync from searching
>> too far, and splitting it in multiple files allows some parallellism
>> (sender transmitting data to receiver, while receiver already checksums
>> the next file; this is heavily limited by disk I/O however).
>>
>> In our case, the BackupPC pool is stored on an XFS filesystem on an LVM
>> volume, allowing a xfsfreeze/sync/snapshot/xfsunfreeze, and using
>> devfiles.pl on the snapshot. Instead of xfsfreeze+unfreeze, a backuppc
>> stop/umount + mount/backuppc start is also possible. If no system for making
>> snapshots is available, you would need to suspend backuppc during the whole
>> synchronisation.
>> In fact, the BackupPC volume is already encrypted on our backup server
>> itself, allowing very cheap encrypted offsite backups (simply not sending
>> the keyfile to the remote side is enough...)
>>
>> The result: offsite backups of our 400GiB pool, containing 350GiB data, of
>> which about 2GiB changes daily, is synchronised 5 times a week with offsite
>> backup in 12-15 hours, requiring nearly no bandwidth. This seems mostly
>> limited by the slow disk I/O on the receiver side (25MiB/s).
>>
>> Hope you find this interesting/useful,
>
> Hi.
>
> This seems to be an interesting approach to solve the offsite backups
> problem. I'll try to test this when I have some time.
>
> thanks
>
>>
>> --
>> Pieter
>>
>> ------------------------------------------------------------------------------
>> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
>> trial. Simplify your report design, integration and deployment - and focus on
>> what you do best, core application coding. Discover what's new with
>> Crystal Reports now.  http://p.sf.net/sfu/bobj-july
>> _______________________________________________
>> BackupPC-users mailing list
>> BackupPC-users AT lists.sourceforge DOT net
>> List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
>> Wiki:    http://backuppc.wiki.sourceforge.net
>> Project: http://backuppc.sourceforge.net/
> --
> Daniel Berteaud
> FIREWALL-SERVICES SARL.
> Société de Services en Logiciels Libres
> Technopôle Montesquieu
> 33650 MARTILLAC
> Tel : 05 56 64 15 32
> Fax : 05 56 64 15 32
> Mail: daniel AT firewall-services DOT com
> Web : http://www.firewall-services.com
>
>
> ------------------------------------------------------------------------------
> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
> trial. Simplify your report design, integration and deployment - and focus on
> what you do best, core application coding. Discover what's new with
> Crystal Reports now.  http://p.sf.net/sfu/bobj-july
> _______________________________________________
> BackupPC-users mailing list
> BackupPC-users AT lists.sourceforge DOT net
> List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
> Wiki:    http://backuppc.wiki.sourceforge.net
> Project: http://backuppc.sourceforge.net/
>

I was having the same thought this morning regarding rsync.  There is
a patch available for rsync to allow it to directly work on raw
devices, but its slated for a future release.  I found this on another
site:

Standard rsync is missing this feature, but there is a patch for it in
the rsync-patches tarball (copy-devices.diff) which can be downloaded
from http://rsync.samba.org/ftp/rsync/ After appling and recompiling,
you can rsync devices with the --copy-devices option.

Of more interest is the open source package zumastor.  It a full blown
snopshot solution for linux.  It has the advantage of setting up
ongoing snapshots (like zfs) to be replicated and applied to a remote
server.  The downside is that its a "copy-on-write" type solution and
this causes reduced write performance on the source server.  This can
be mitigated through the use of NVRAM to hold the filesystem journal,
but the degree of technical difficulty seems to rise quickly in this
solution and may not be appropriate for the SOHO or faint of heart.

-- 
Jonathan Craig

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/