BackupPC-users

Re: [BackupPC-users] Problems with hardlink-based backups...

2009-08-31 15:49:19
Subject: Re: [BackupPC-users] Problems with hardlink-based backups...
From: "Jeffrey J. Kosowsky" <backuppc AT kosowsky DOT org>
To: "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Mon, 31 Aug 2009 15:44:59 -0400
Les Mikesell wrote at about 12:35:49 -0500 on Monday, August 31, 2009:
 > Jeffrey J. Kosowsky wrote:
 > >
 > >  > It's almost as if you guys haven't heard of filesystem-specific dump 
 > >  > utilities.  For such utils (vxdump, ufsdump, zfs send/receive, etc.) 
 > > the 
 > >  > number of hardlinks isn't a problem.  You can do both full and 
 > >  > incremental dumps, even across separate machines.  This isn't a problem 
 > >  > that needs solving.
 > > 
 > > I think you are missing some key points.
 > > First, why should a program require it's own separate filesystem? This
 > > seems to me like an unnatural and kludgey type of requirement.
 > 
 > It doesn't, unless you hit some limit on the filesystem you use.  The 
 > usual limit is how many times and how far you have to move the disk head 
 > since that's the slow operation.
 > 
 > > I see lots of advantage in keeping the database portion relatively
 > > small, fast, replicable, and moveable. Then you can keep and
 > > distribute the files themselves wherever you want them spread across
 > > one or more separate filesystems. Then the database portion is
 > > optimized for what a database does best and the file-storing
 > > portion can be optimized for what a filesystem stores best. And both
 > > parts are easily moveable, replicable and not dependent or limited by
 > > hardlinks or other filesystem-dependent functionality.
 > 
 > But the parts aren't independent.  How do you propose keeping them in 
 > sync or fixing them when they inevitably differ?

Perhaps analogously to the way BackupPC_Nightly now makes sure that
pool is in synch.

More generally, we would need to consider two things:
1. What are the normal ways in which the two could get out of synch
   and then address each of those cases

2. If necessary, create a repair tool (similar to what I created for
   the current system) if something breaks in non-standard ways (e.g.,
   due to crashes, disk failures, etc.)

I guess I can't answer your question without knowing what use cases
you are worried about.

 > 
 > > Don't get me wrong - Backuppc is great and hardlinks are a great
 > > kludge to at first glance get something for nothing. I'm just saying
 > > that hardlinks while "easy" bring some longer-term limitations and
 > > that there comes the time when it may be worth investing in going
 > > beyond them.
 > > 
 > > Personally I would like to see Backuppc evolve to combine the pooling
 > > functionality, leveraging of rsync, and relative simplicity of the
 > > existing Backuppc with the expandability, portability, and flexibility
 > > of the database-based systems like Bacula. I believe that the
 > > combination of a database to store the file attributes and metadata
 > > together with a filesystem to store the pool would be an ideal hybrid.
 > 
 > But someone has yet to establish that this would be faster if you don't 
 > add the requirement of putting the sql tables on different drives - and 
 > weren't you just saying that applications shouldn't have requirements 
 > like that?
 > 
 > >  > 
 > >  > For anyone thinking that working with giant multi-gigabyte BLOBs in a 
 > >  > database is the right way to go, I suggest you actually attempt it 
 > >  > yourself and see what happens.  I'm backing up my HD video production 
 > >  > rig with BackupPC, and although such a machine (Windows, 16T of 
 > > storage, 
 > >  > most video files are at least 50G in size) is outside of the intent of 
 > >  > BackupPC, it actually works.  If BackupPC were to rely on an SQL 
 > >  > database, it would greatly shrink the potential userbase.
 > > 
 > > You are attacking a straw man. No one has ever suggested
 > > "multi-gigabyte BLOBS in a database." The database would only consist
 > > of the filenames, links, attrib data, and other backup-related metadata. I
 > > would imagine in most cases this would be at most a couple of
 > > gigabytes, assuming you have millions of files in your pool.
 > 
 > If you don't put them in the database you can't enforce atomic 
 > operations - something you get in the filesystem for the price of a seek 
 > over to the inode to bump the link count with the entry locked.
 > 
 > -- 
 >    Les Mikesell
 >     lesmikesell AT gmail DOT com
 > 
 > 
 > 
 > ------------------------------------------------------------------------------
 > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
 > trial. Simplify your report design, integration and deployment - and focus 
 > on 
 > what you do best, core application coding. Discover what's new with 
 > Crystal Reports now.  http://p.sf.net/sfu/bobj-july
 > _______________________________________________
 > BackupPC-users mailing list
 > BackupPC-users AT lists.sourceforge DOT net
 > List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
 > Wiki:    http://backuppc.wiki.sourceforge.net
 > Project: http://backuppc.sourceforge.net/
 > 

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

<Prev in Thread] Current Thread [Next in Thread>