BackupPC-users

Re: [BackupPC-users] Looking for some comments on sizing.

2017-02-09 03:27:37
Subject: Re: [BackupPC-users] Looking for some comments on sizing.
From: Johan Ehnberg <johan AT molnix DOT com>
To: backuppc-users AT lists.sourceforge DOT net
Date: Thu, 9 Feb 2017 10:09:53 +0200
Hi Scott,

I've been looking into scaling BackupPC to some extent.

Object storage is not yet feasible in v4, due to some database-like 
files being kept in the parts that would go into object storage (cpool). 
Also, in the current development builds, there are several fairly heavy 
operations on the cpool that would slow it all down too much. Some may 
be avoidable in future builds.

Running BackupPC as a symmetric/same-config scale-out solution is not 
possible due to many reasons. Splitting up the work is perfectly doable, 
though, if you can sacrifice deduplication between the instances.

If we are talking PB's of data, node-based storage is likely not 
feasible. If your files are mostly large, then network based storage is 
a good option. My experiences with ceph remote block devices in this 
kind of setting are positive. If the files are mostly small, you may be 
hit by random access latency with any network-based solution.

If your data is a mix of large and small, consider splitting them up if 
the small can fit on local storage where small random I/O is orders of 
magnitude faster.

In order to scale the filesystem itself, I would opt for a copy-on-write 
filesystem, since these fit the case of storing static files well. Most 
also give you data integrity verification out of the box. For example, 
key benefits of ZFS are:
- Data integrity verification
- Automatic bit rot fixing if you run ZFS built-in RAID
- Snapshot send/receive for remote replication faster than rsync
- Transparent compression
- Dynamic, on-line resizing
So with ZFS, you can handle both compression and data integrity checking 
outside of BackupPC, speeding things up a lot.

I have not tried ZFS on ceph yet, though. Balancing the redundancy 
optimally is an issue in this scenario, since taking advantage of both 
ceph and ZFS easily costs 4 times (twice for each, or ~1.3 times for ZFS 
and three times for ceph) the space of the data to be stored plus plenty 
of RAM. (Note that googling 'ZFS on ceph' gives a lot of issues about 
ceph on ZFS OSD's, which is irrelevant here.)

Best regards,
Johan Ehnberg

-- 
Johan Ehnberg
johan AT molnix DOT com
+358503209688

Molnix Oy
molnix.com

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/