BackupPC-users

Re: [BackupPC-users] why hard links?

2009-06-02 21:19:36
Subject: Re: [BackupPC-users] why hard links?
From: Holger Parplies <wbppc AT parplies DOT de>
To: "Jeffrey J. Kosowsky" <backuppc AT kosowsky DOT org>
Date: Wed, 3 Jun 2009 03:15:27 +0200
Hi,

Jeffrey J. Kosowsky wrote on 2009-06-02 14:26:44 -0400 [Re: [BackupPC-users] 
why hard links?]:
> Les Mikesell wrote at about 12:32:14 -0500 on Tuesday, June 2, 2009:
>  > Jeffrey J. Kosowsky wrote:
>  > > [...]
>  > >  > If you have to add an extra system call to lock/unlock around some
>  > >  > other operation you'll triple the overhead.
>  > > 
>  > > I'm not sure how you definitively get to the number "triple". Maybe
>  > > more maybe less. 

I agree. It's probably more.

>  > Ummm, link(), vs. lock(),link(),unlock() equivalents, looks like 3x the 
>  > operations to me - and at least the lock/unlock parts have to involve 
>  > system calls even if you convert the link operation to something else.
> 
> 3x operations != 3x worse performance
> Given that disk seeks times and input bandwidth are typical
> bottlenecks, I'm not particularly worried about the added
> computational bandwidth of lock/unlock.

Since you can't lock() the file you are about to create (can you?), you'll
probably need a different file - either one big global lock file or one on the
directory level. I'm not familiar with the kernel code, but I wouldn't be
surprised if that got you the disk seeks you are worried about.

>  > > Les - I'm really not sure why you seem so intent on picking apart a
>  > > database approach.
>  > 
>  > I'm not. I'm encouraging you to show that something more than black 
>  > magic is involved. [...]
> 
> I never claimed performance. My claims have been around flexibility,
> extendability, and transportability.

And I'm worried about complexity and robustness:
1. Complexity
   What additional skills do you need to set up the BackupPC version you are
   imagining and keep it running?
2. Complexity
   Who is going to write and, more importantly, debug the code? How do you test
   all the new cases that can go wrong? How do people feel about entrusting
   vital data to a system they no longer have a basic understanding of?
3. Complexity
   When everything goes wrong, what can you still do with the data? Currently,
   you can locate a file in the file system (file mangling is not that
   complicated) or even with an FS debugging tool in an image of an
   unmountable FS and BackupPC_zcat it to get the contents. Attributes are lost
   that way, but for regaining the contents of a few crucial files, this can
   work quite well. It could be made to even restore the attributes with only
   slightly more requirements (intact attribs file). With a database, can you
   do anything at all without a completely running BackupPC system? What are
   the exact requirements? Database file? Database engine? Accessible pool
   file system?
4. Robustness, points of failure
   How do you handle losing single files, on-disk corruption of a few files?
   Losing/corrupting many files? Your database?

> I think all (or nearly all) of my 7 claimed advantages are
> self-evident.

Yes, mostly, though they were claimed in a different thread. I hope everyone
has multiple MUAs open ...

1. I don't see how "platform and filesystem independence" fits together with
   the use of a database, though. You are currently dependent on a POSIX file
   system. How is depending on one of a set of databases any better?

4. How does backing up the database and *a portion of the pool* work? Sure,
   you can make anything fault-tolerant, but are missing files faults of which
   you *want* to be tolerant?
   But yes, backing up the complete pool would be easier, though it's your
   responsibility to get it right (i.e. consistent), and there's probably no
   sane way to check.

5.1. Why is file name mangling a kludge, and in what way is storing file names
     in a database better?

5.2. What is non-standard about defining a file format any way you like? It's
     not like compressed pool files would otherwise adhere to a particular
     known file format. But yes, treating compressed and uncompressed files
     alike would be nice.

5.3. I'm not really sure encrypting files *on the server* does much, unless
     you are thinking of a remote storage pool. In particular, you need to be
     able to decrypt files not only for restoration, but also for pooling
     (unless you want an intermediate copy and an extra comparison).

5.5. Configuration stored in the database? Is that supposed to be an
     advantage?

6. If you mean access controlled by the database (different database users),
   I don't really see why you are worried about access to the *meta data* when
   the actual contents remain readable (you're not saying that it being such a
   huge amount of data is a security feature, are you?).
   If you mean that a database will make it easier to implement file level
   access control, I honestly don't see how.

7. How that? If you are less concerned about how much space you use, you can
   store things in a way that they can be accessed faster. But I still think
   you are mistaken in that multiple attrib files would need to be read. I've
   had to read so much discussion on this today that I won't check the code
   now, but I'd reason that for attrib file pooling to make any sense, the
   default would be an identical attrib file (compared to the reference
   backup) if no files in the directory were changed.
   Or, differently, if BackupPC *would* need to scan multiple attrib files,
   your delete-file-from-backups script would only ever need to modify one
   attrib file for any file it deletes, right? ;-)

> Plus, I don't want my backup system to be
> filesystem dependent because I might have other reasons for picking
> other filesystems or my OS of the future (or of today) might not even
> support the filesystem features required.

The same arguments hold against incorporating a database.

> I think good system design calls for abstracting the backup software from
> the underlying filesystem.

Well, the only thing you are abstracting from are hardlinks, which are POSIX
standard. I wouldn't be surprised if there were other POSIX dependencies.
BackupPC currently makes no other assumptions about the file system, does it?
Well, file size maybe - you need a file system capable of storing large enough
files. And long enough paths. I look forward to the introduction of
$Conf{PathSeparator} ...

Regards,
Holger

------------------------------------------------------------------------------
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/