BackupPC-users

Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 14:47:16
Subject: Re: [BackupPC-users] Backing up a BackupPC server
From: "Jeffrey J. Kosowsky" <backuppc AT kosowsky DOT org>
To: "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Tue, 02 Jun 2009 14:40:42 -0400
Les Mikesell wrote at about 12:16:59 -0500 on Tuesday, June 2, 2009:
 > Jeffrey J. Kosowsky wrote:
 > > 
 > >  > >> Still, it would be awesome to combine the simplicity and pooling
 > >  > >> structure of BackupPC with the flexibility of a database
 > >  > >> architecture...
 > >  > >>   
 > >  > > I, for one, would be willing to contribute financially and with my 
 > > very 
 > >  > > limited skills if Craig, or others, were willing to undertake such an 
 > >  > > effort. Perhaps Craig would care to comment.
 > >  > 
 > >  > The first thing needed would be to demonstrate that there would be an 
 > >  > advantage to a database approach - like some benchmarks showing an 
 > >  > improvement in throughput in the TB size range and measurements of the 
 > >  > bandwidth needed for remote replication.
 > > 
 > > No one ever claimed that the primary advantages of a database
 > > approach is throughput. The advantages are really more about
 > > extensibility, flexibility, and transportability. If you don't value
 > > any of the 7 or so advantages I listed before, then I guess a database
 > > approach is not for you.
 > 
 > I just consider a filesystem to be a reasonable place to store backups 
 > of files, where a database is a stretch, and I know how to deal with 
 > most of the problems with filesystems and what to expect from them where 
 > databases introduce a whole new set of issues.  What's the equivalent of 
 > fsck for a corrupted database and how long does it take to fix a TB of data?

Red herring. There is no 1TB of data (in most cases). Only the
metadata gets stored in the database. The files still get stored in
the pool. The size of the database would be about the same size as the
combined size of the attrib files - probably even smaller since a lot
of information in the attrib files are repeated between backups.

 > > Also, while clearly, a database approach would in general have more
 > > computational overhead (at least for backups), from my experience the
 > > bottlenecks are network bandwidth and disk speed. In fact, some people
 > > have implemented BackupPC to run native on a 500MHz ARM processor
 > > without effective slowdown. (On the other hand, restore-like
 > > operations would likely be faster since it would be simpler to walk
 > > down the hierarchy of incremental backups) So, I don't think you would
 > > find any significant slowdowns from a database approach. If anything a
 > > database approach could allow significantly *faster* backups since the
 > > file transfers could be split across multiple disks which is not
 > > possible under BackupPC unless you use LVM.
 > 
 > Again, I know how to deal with filesystems and I'd use a raid0/10/6 if I 
 > wanted to split over multiple disks.  But I don't because I want to be 
 > able to sync the whole thing to one disk that I can remove and I want to 
 > be able to access data from any single disk just by plugging it in to 
 > some still-working computer.

Red herring again. The database would typically be much smaller than
the pool so it should more easily fit onto a single disk than the
current implementation. On the other hand, in a database
implementation, the pool files could be split across multiple disks
since there would be no need for hard links anymore, thus giving much
more storage flexibility in addition to making it easier to replicate
the backup.

 > 
 > >  > Personally I think the way to make things better would be to have a 
 > >  > filesystem that does block-level de-duplication internally. Then most 
 > > of 
 > >  > what backuppc does won't even be necessary.   There were some 
 > >  > indications that this would be added to ZFS at some point, but I don't 
 > >  > know how the Oracle acquisition will affect those plans.
 > > 
 > > Ideally, I don't think that the backup approach should depend on the
 > > underlying filesystem architecture. 
 > 
 > It wouldn't depend on it, it would just mean that you might be able to 
 > store 10x or more data for the same price where there is a lot of 
 > redundancy.

Or use a database to store the relationships and it is filesystem independent.

 > 
 > > Such a restriction limits the
 > > transportability of the backup solution just as currently BackupPC
 > > really only works on *nix systems with hard links.
 > 
 > Transportability?  I can access my backuppc disk copy using a USB 
 > adapter cable from a vmware instance of linux on my laptap while it is 
 > also still running windows.  I can do the same from a Mac, probably with 
 > the free virtualbox if I didn't want to pay for Vmware fusion.  You can 
 > boot just about anything with a CD or USB drive into linux and mount it. 
 >   You can't get much more portable than that - the OS itself is both 
 > portable and transportable.  And opensolaris under vmware/virtualbox 
 > would work as well if that's what it takes for a quick remote
 > restore.

Some people might want (horrors) to run BackupPC natively on a Windows
machine without having to run a virtual machine and have a
*nix-formatted filesystem hanging off a USB drive. According to your
definition, every program in the world is pretty much equally portable
across OS's since you just need to fire up a VM...

 > > A database approach
 > > allows one to get away from dependence on specific filesystem features.
 > 
 > Some real world examples please?  Are you thinking of replicating from 
 > one OS to another?

This is obvious -- hard links are filesystem dependent. So any
implementation requiring hard links itself is filesystem dependent.

 > 
 > > That doesn't mean there isn't room for specialized filesystem
 > > approaches but just that such a requirement limits the audience for
 > > the backup solution since it will be a while before we all start
 > > running ZFS-type filesystems and then we will have the issue of
 > > requiring different optimizations and code for different filesystem
 > > approaches.
 > 
 > We already do have the issue of different optimizations for different 
 > filesytems - and databases are even worse.
 > 

Pick one or two database implementations that work on multiple
platforms. Problem solved.

Les, I understand that BackupPC as-is works perfectly for you on
ZFS/Solaris. However, you need to recognize that some of us have
different setups and different needs. Just because you don't need an
SUV for your transportation needs doesn't mean you can convince me
that I don't need an SUV for my different transportation needs. Maybe
it's even true that a database approach would measurably degrade
performance (though I doubt it) but that doesn't mean that the
tradeoffs of better flexibility, extendability, and transportability
aren't worth it for other people.

------------------------------------------------------------------------------
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

<Prev in Thread] Current Thread [Next in Thread>