BackupPC-users

Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 09:25:27
Subject: Re: [BackupPC-users] Backing up a BackupPC server
From: "Jeffrey J. Kosowsky" <backuppc AT kosowsky DOT org>
To: "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Tue, 02 Jun 2009 09:18:13 -0400
Tino Schwarze wrote at about 13:07:29 +0200 on Tuesday, June 2, 2009:
 > On Tue, Jun 02, 2009 at 06:27:35AM -0400, Peter Walter wrote:
 > 
 > > As a Linux newbie, I have only a partial understanding of the technology 
 > > underlying Linux and BackupPC, but I get the impression that the problem 
 > > with a rsync-like solution is that processing hardlinks is very 
 > > expensive in terms of cpu time and memory resources. This may be a 
 > > stupid question, but, if hardlinks are the problem, has any thought been 
 > > given to adding to BackupPC an option to use some form of database 
 > > (text, SQL or otherwise) to associate hashes to files, instead? It seems 
 > > to me that using hardlinks is in fact using that feature of the file 
 > > system *as* a database, a use that does not appear to be optimal ... if 
 > > I have misunderstood, please educate me :-)
 > 
 > An SQL approach would be rather complicated because it would have to
 > support a directory structure. We would end up with ... a filesystem!
 > The nice thing about using hardlinks is that the operating system keeps
 > track of the link count and we can use that link count to check for
 > superfluous files. This might be doable in a database as well, but
 > we'd have to keep a file system and a database in sync. Doable, but
 > error-prone. With the current design, there is only a file system.

I agree a database architecture adds some complexity but I'm not sure I
agree with your other points.

First, the 'attrib' file (along with the attendant complexity involved
in filling-in incremental backups from previous incrementals) is also
in a sense writing a filesystem on top of a filesystem since the
attrib file encodes file types (including files, directories, soft &
hard links, etc.) and file attribs. If anything, it is kludgey in that
things like hard links are represented in a non-natural
manner. Indeed, once the initial database architecture is established,
I think that extending the database architecture to include additional
attributes is far *simpler* than trying to extend the attrib
structure. For example, backuppc still doesn't account for selinux
extended attributes let alone more general linux ACLs or even Windows
ACLs.

I also don't understand or agree with your points about the difficulty
and error prone nature of synchronizing a database with the pool. If
anything I think that sprinkling thousands of attrib files all over
the place is much more complex, error-prone, and harder to guarantee
integrity. In the database world, you just have a single database and
a pool of files. Also modern databases have a lot more tools and
safeguards for checking and maintaining integrity than older
filesystems like ext2/ext3. Finally, once a day, a process similar to
BackupPC_nightly could be run to crawl the database to delete unneeded
pool entries (or it could be done real-time whenever backups are
deleted).

Now a database architecture would probably be slower for some
operations than raw filesystem access, but for other operations such
as restoring an incremental backup, I wouldn't be surprised if a
databases system would be much faster since the reconstruction and
inheritance could be optimized within the database rather than having
to open multiple trees of attrib files and to reconstruct the
inheritance logic in real-time.

In my mind the only major reason not to move to a database
architecture is that it would require a substantial re-write of
BackupPC as pointed out in my earlier note.

 > 
 > Tino, not doing backups of the pool, but archiving hosts to tape.
 > 
 > -- 
 > "What we nourish flourishes." - "Was wir nähren erblüht."
 > 
 > www.lichtkreis-chemnitz.de
 > www.craniosacralzentrum.de
 > 
 > ------------------------------------------------------------------------------
 > OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
 > looking to deploy the next generation of Solaris that includes the latest 
 > innovations from Sun and the OpenSource community. Download a copy and 
 > enjoy capabilities such as Networking, Storage and Virtualization. 
 > Go to: http://p.sf.net/sfu/opensolaris-get
 > _______________________________________________
 > BackupPC-users mailing list
 > BackupPC-users AT lists.sourceforge DOT net
 > List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
 > Wiki:    http://backuppc.wiki.sourceforge.net
 > Project: http://backuppc.sourceforge.net/

------------------------------------------------------------------------------
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/