BackupPC-users

Re: [BackupPC-users] Problems with hardlink-based backups...

2009-08-31 17:07:00
Subject: Re: [BackupPC-users] Problems with hardlink-based backups...
From: Tino Schwarze <backuppc.lists AT tisc DOT de>
To: backuppc-users AT lists.sourceforge DOT net
Date: Mon, 31 Aug 2009 23:03:40 +0200
Hi all,

On Mon, Aug 31, 2009 at 04:32:14PM -0400, Jeffrey J. Kosowsky wrote:

> In a very real sense, the current implementation already uses an
> artificial database structure - albeit it a slow, prorprietary,
> non-extensible, non-optimizable version. To wit, the attrib files
> present in each and every pc directory. The real essence of my
> suggestion is to replace the scattered myriad of attrib linear
> databases with a single relational database that can benefit from all
> the features, speed, tools, and optimizations of modern databases. As
> has been mentioned many times in the past, such a move would solve
> many, many problems though would obviously require some significant
> development work.

I suppose this is the most important argument _for_ trying the SQL
approach - maybe just for storing file attributes?

On the other hand, we're using one kind of atomic file system operation:
Hardlink count, that is, file expiration. That would be more difficult using
a database (prone to DB<->filesystem inconsistencies).

Maybe we should move this discussion to the -devel list? Or somebody
should come up with a database scheme, so we could start discussing
details - possibly figuring out that the requirements are difficult to
meet with a database? 

I'm just skeptical that it's is possible to store file system layout
more efficiently than a file system - and I suppose we'd need to
completely represent the directory structure of backups in database.
We'd end up with loads of entries pointing to a file.id(int8) which
is equivalent to the inode number in filesystem world. File attributes
would have to be stored in a separate table since they may be different
from host to host while file content is identical (and I'm not sure how
to do that efficiently, taking extended attributes like ACL, resource
forks etc. into account - you'll either get into JOIN hell or you'll
start storing serialized data).

Of course, a database might allow lookups like "which backups reference
file x". Also, standard databases are not good at querying hierarchical
structures. It's more natural for filesystems (but only up to a certain
point - traversing is still expensive).

These are just my random thoughts. I suppose it's worth spending some
time discussing/designing/developing a database layout - we'll learn a
lot and

a) it looks like it's worth trying to implement it - hey then we'd
already have a database layout!

b) we get convinced that it's not worth it or it's getting too
complicated - hey, then we've tried and get something out of the process
to show to people claiming that a database would improve things

Tino.

PS: Another weird thought just crossed my head: Maybe separating pool
data from backups might be worth a try. That is: Only store zero-byte
files in the pool (or maybe files with some metadata like MD5 in them)
which get hardlinked to backups, then have a second pool which contains
real data and no hardlinks (the implicit connection being the pool file
name). Creating and changing pool files is a rather central operation
(done by BackupPC_dump/_link/_nightly). That way, we could decouple the
extensive directory lookups while traversing a backup from the data
reading/writing - file pool could be separated from data pool. Without
detailed knowledge of the code, I suppose it should be doable as a
proof-of-concept hack.

Of course, this should be a configurable setting since it only makes
sense when there are actually separate physical volumes for metadata and
filedata.

-- 
"What we nourish flourishes." - "Was wir nähren erblüht."

www.lichtkreis-chemnitz.de
www.craniosacralzentrum.de

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

<Prev in Thread] Current Thread [Next in Thread>