BackupPC-users

Re: [BackupPC-users] Problems with hardlink-based backups...

2009-09-02 22:26:06
Subject: Re: [BackupPC-users] Problems with hardlink-based backups...
From: dan <dandenson AT gmail DOT com>
To: "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Wed, 2 Sep 2009 20:21:59 -0600

You seem to have the illusion that sql can magically avoid the head motions that
make backuppc slow while still getting the same things on the same disks.  While
it is possible to tune most sql servers to put different tables on different
drives, there's a fair chance that in a default configuration it will perform
worse than using links.


I dont think so.  MySQL like any database is designed to handle small chunks of data very efficiently.  It will cache and reorder writes much better than a filesystem for pieces of data that are consistantly the same size.  Remember, I dont think for 1 second that the files themselves should be storage in the database.  We are talking about the difference between head movement over a large disk vs head movement on a tiny file (relative to the cpool).   We are also talking about entries in the database that are just attributes, filename, hash, disk location, dates..  Not big data.
 
I'm not sure being platform-limited matters that much.  Why wouldn't I be able
to install OpenSolaris/zfs on anything where I'd be likely to run a database?

I guess we looked at this from different angles.  What if you are a linux admin and not a solaris admin or moreover, what if you think Sun is not a terrible honest company and might switch teams.  I guess there are plenty of political/religeous/whatever reasons to stay away from any one platform.  I also think linux is per stable and performant than solaris and also think that btrfs may be a good or better solution..

 
>     - Allows for more granular security and access controls to backups
>
> How about much much easier PHP work getting backup info and a faster
> backuppc interface only having to hit the database for all its info and
> not having to touch the filesystem.

I don't have a lot of use for backuppc info other than knowing that it completed.

I like to access the data quickly.  My servers have so much data that the cgi cannot be used to pull data.

 
> One of the biggest concerns with backuppc that is constantly discussed
> on this list is syncing the backup data between two or more servers.
> Simply reducing the file count by eliminating the hardlinks would allow
> rsync to be used reliably and effectively.  SQL replication can keep
> metadata updated constantly and a watchdog that monitors the SQL for
> changes could keep the filesystems that store data synced easily as
> well.

Maybe, but again it won't do what you want by default.  Most sql replication
schemes work more or less in real time which probably isn't what you want at all.


Because mysql master->slave or master-master uses transaction logs you dont have to have the slave online.  when you start it up it will sync up from the transaction log.
 
 >  Once the metadata and config moves to a database, so many things
> become very easy.

No, they just become different. You now have to use database-specific tools to
touch anything and if the file contents aren't included in the database you've
made it impossible to do atomic operations.

I would say they become easier.  Simply speeking, you can do a select on the database for files ending is *.xls|*.doc with a date more recent than last friday and show filepath and filename and also only show the newest of each file and do it by host/backup/date/etc.  You can execute a script to take the filepath and filename and copy the file from the path and name it properly as well as apply permissions or ACL to it and you can do this in php in the browser or on the command line with a single piece of code.


> A single backuppc server could handle many more
> concurrent backups because multple data storage devices can seperate IO
> and relieve the pressure on the IO system of the OS.

That has yet to be proven.

I cant argue that directly but we can make SOME assumptions that will hold true.  writing 2 files to 2 different disks will be faster than writing 2 files to once disk, all things being equal.  seperating the most IO intensive tasks to devices that are better at IO will improve performance in direct IO(this is true now if you use backuppc on fast SSD drives, its just too expensive in $$).  There are a lot of unknowns.  How much performance would be gained by pushing metadata off to disk?  would that performance improvement show up in a backup or can it only be realized in synthetic benchmarks.  I have zero doubt that some parts will be significantly faster but will conceed that those parts may only make up 5% of the total backup time, which could save a whopping 3%.  I feel that this would be significantly faster because eliminating writing the hardlinks will save head travel time, which binds up IO.
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/