BackupPC-users

Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 12:01:34
Subject: Re: [BackupPC-users] Backing up a BackupPC server
From: Tino Schwarze <backuppc.lists AT tisc DOT de>
To: backuppc-users AT lists.sourceforge DOT net
Date: Tue, 2 Jun 2009 17:58:20 +0200
On Tue, Jun 02, 2009 at 10:06:40AM -0500, Les Mikesell wrote:

> >> Still, it would be awesome to combine the simplicity and pooling
> >> structure of BackupPC with the flexibility of a database
> >> architecture...
> >>   
> > I, for one, would be willing to contribute financially and with my very 
> > limited skills if Craig, or others, were willing to undertake such an 
> > effort. Perhaps Craig would care to comment.
> 
> The first thing needed would be to demonstrate that there would be an 
> advantage to a database approach - like some benchmarks showing an 
> improvement in throughput in the TB size range and measurements of the 
> bandwidth needed for remote replication.

In my experience, BackupPC is mainly I/O bound. It produces a lot of
seeks within the block device system (for directory and hash lookup).
This might actually benefit from a relational database - you'd just do
the appropiate SELECT, have some indices in place etc. Of course,
there's still that "how to store and query the directory hierarchies
efficiently" problem.

Maybe someone should propose a real design, then we may check how to map
BackupPC's access patterns to the database structure. It might turn out
to become really complex - I'm just wondering how to store files,
directories, attributes, the pool, a particular backup number. We
currently create the directory structure for each backup, so we may
store the attrib file (to keep track of deleted files, at least). We'd
have to do that for the database, too. There's no other solution, IMO.

I suppose, you could only benchmark something after implementing a
sufficiently complex part of the problem to solve.

Another idea: Do we have performance metrics of BackupPC? It might be
useful to check what operations take most of the time. Is it pool
lookups? File decompression? Directory traversal for incrementals?

If, for example, we figure out, that hash lookups and checksum reading
of hash files etc. are expensive, a little database (actually a
hashtable) might suffice, sort of a memcached which keeps track of pool
files, their size and checksum. This might be doable (maybe disabled by
default if it requires additional setup) and work like a cache.

> Personally I think the way to make things better would be to have a 
> filesystem that does block-level de-duplication internally. Then most of 
> what backuppc does won't even be necessary.   There were some 
> indications that this would be added to ZFS at some point, but I don't 
> know how the Oracle acquisition will affect those plans.

I don't think that belongs into the file system. In my opinion, a file
system should be tuned for one purpose: Managing space and files. It
should not care for file contents in any way, IMO.

> Meanwhile, if someone has time to kill doing benchmark measurements, 
> using ZFS with incremental send/receive to maintain a remote filesystem 
> snapshot would be interesting.  Or perhaps making a vmware vmdk disk 
> with many small (say 1 or 2 gig) elements and running backuppc in a 
> virtual machine.  Then for replication, stop the virtual machine and 
> rsync the directory containing the disk image files.  This might even be 
> possible without stopping if you can figure out how vmware snapshots work.

You don't want heavy I/O in Vmware without direct SAN attached or
similarly expensive setups.

I'd rather propose a patch to rsync adding --threat-blockdev-as-files .
This would require block-level checksum generation on _both_ sides,
though, so it's rather I/O and CPU intensive. Then, DRDB might be the
way to go - they already take note of changed parts of the disk (but
that's a guess).

Tino.

-- 
"What we nourish flourishes." - "Was wir nähren erblüht."

www.lichtkreis-chemnitz.de
www.craniosacralzentrum.de

------------------------------------------------------------------------------
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

<Prev in Thread] Current Thread [Next in Thread>