BackupPC-users

Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 14:21:49
Subject: Re: [BackupPC-users] Backing up a BackupPC server
From: Les Mikesell <les AT futuresource DOT com>
To: "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Tue, 02 Jun 2009 13:16:05 -0500
Jeffrey J. Kosowsky wrote:
> 
>  > Do you actually have any experience with large scale databases?  I think 
>  > most installations that come anywhere near the size and activity of a 
>  > typical backuppc setup would require a highly experienced DBA to 
>  > configure and would have to be spread across many disks to have adequate 
>  > performance.
> 
> I am by no means a database expert, but I think you are way
> overstating the complexity issues.

I've worked with lots of filesystems and a few databases - and had many 
more problems with the databases.  For example, they are not at all 
happy or forgiving if you run out of underlying filesystem space.  And 
it's not clear how to fix them if they are corrupted by a crash. When 
you are dealing with backups you want them to work regardless of other 
problems - the time you need them is precisely when you have a bunch of 
other problems.

> While the initial design would
> certainly need someone with experience, I don't know why each
> implementation would require a "highly experienced DBA" or why it
> "would have to be spread across many disks" any more than a standard
> BackupPC implementation. Modern databases are written to hide a lot of
> the complexity of optimization.

Modern filesystems optimize file access because they know the related 
structures (directories, inodes, free space list).  Databases don't know 
what you are going to put in them or how they relate.  They can be tuned 
to optimize them for any particular thing but that isn't inherent.

> Plus the database is large only in the
> sense of having lots of table entries but is otherwise not
> particularly complex nor do you have to deal with multiple
> simultaneous access queries which is usually the major bottleneck
> requiring optimization and performance tuning.

Multiple concurrent writes are the hard part, something backuppc will be 
doing all night long.

> Similarly the queries
> will in general be very simple and easily keyed relative to other
> real-world databases. Remember size != difficulty or complexity.

Backups are a mostly-write operation, hopefully.

>  > When you get down to the real issues, normal operation has 
>  > a bottleneck with disk head motion which a database isn't going to do 
>  > any better without someone knowing how to tune it across multiple disks. 
> 
> This seems like a red herring. The disk head motion issue applies
> whether the data is stored in a database or in a combination of a
> filesystem + attrib files.

Sort of, but the OS, filesystem and buffer cache have years of design 
optimization for their specific purpose and they are pretty good at it. 
  And unless the database uses the raw device it can only add overhead 
to the underlying filesystem access.

> If anything, storage in a single database
> would be more efficient than having to find and individually load (and
> unpack) multiple attrib files since the database storage can be
> optimized to some degree automagically while even attrib files that
> are logically "sequential" could be scattered all over the disk
> leading to inefficient head movement.

This is the sort of thing where you need to produce evidence.  I'd 
expect the attrib files to be generally optimized with respect to the 
locations of the relevant directories that you will be accessing at the 
same time because the filesystem knows about these locations when 
allocating the space, whereas a database on top of a filesystem has no 
idea of where the disk head will be going next.

 > Also, the database could be
> stored on one disk and the pool on another but this would be difficult
> if not impossible to do on BackupPC where the pool, the links, and the
> attrib files are all on the same filesystem.

Agreed - if you have a skilled DBA to arrange this.  It's not going to 
happen out of the box.

>  >     Also, while some database do offer remote replication, it isn't 
>  > magic either and keeping it working isn't a common skill.
>  > 
> 
> Again a red herring. Jut having the ability to temporarily "throttle"
> BackupPC leaving the database in a consistent state would allow one to
> just simply copy (e.g., rsync) the database and the pool to a backup
> device. This copy would be much faster than today's BackupPC because
> you wouldn't have the hard link issue. Remote replication would be
> even better but not necessary to solve the common issue of copying the
> pool raised by so many people on this list.

There's only a small difference in scale here (and it's not obvious 
which direction) between rsync'ing a raw database file and rsync'ing an 
image copy of a filesystem.  There's probably not much of a practical 
difference.

-- 
   Les Mikesell
    lesmikesell AT gmail DOT com


------------------------------------------------------------------------------
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/