BackupPC-users

Re: [BackupPC-users] Problems with hardlink-based backups...

2009-08-23 12:35:39
Subject: Re: [BackupPC-users] Problems with hardlink-based backups...
From: dan <dandenson AT gmail DOT com>
To: mstowe AT chicago.us.mensa DOT org, "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Sun, 23 Aug 2009 10:30:17 -0600
Speed.  Backuppc is constrained by I/O performance as a bottleneck on the system is that the storage volume must be a single filesystem due to hardlinks.  It has been measured a number of times on this mailing list that I/O is the major bottleneck for backuppc.  Getting faster hardware certainly helps but the reliance on a single filesystem for all data is a bottleneck for performance as well as an irritation when upgrading storage as you either need to add additional raid arrays (as expanding a raid is not generally an option) or just use JBOD with LVM or something.   not-ideal.

My solution is to break the backup scheme into smaller chunks and have a number of backuppc servers handling a set number of clients.  The issues here are complexity as I need to admin a number of servers and loss of the file de-duping.   In my organization like many others, each client will have absolutely identical files.  4 backup machines means that a massive amount of data is duplicated 4 times PLUS whatever redundancy is in the raid.

A hybrid platform can use the filesystems strengths and a databases strengths and no have most of the weaknesses.


My example was a simplistic one.  Sure MD5 can have some collisions so either MD5+SHA1 or just do SHA2.  You would need to store a few more peices of data but I think it would be hard to argue that mysql is many orders of magnitude faster at finding data than a filesystem just like it is hard to argue that a filesystem is many times faster at simply storing files and even faster at storing large files.

Other benefits of the hybrid system are that the files can be on a different volumes than the database.  In fact, because you store the files location on disk in the database, you could store files on many different disks, with to issues with hardlinks.  Because of this, you could put two backuppc machines together in a cluster and each instance of backuppc would look at the same database (or replicated data on their own database) and be able to do online replication of the filestore on other servers.  They could automatically duplicate these files on their own local file store and because there are not millions of hardlinks to worry about, rsync can actually be useful in syncing up file stores to other backuppc machines.  sure you will still have a lot of files but you will have a lot less files for rsync to track.  rsync can handle a lot of files.  with backuppc rsync actually has to track every instance of every file from each host and each backup number plus the pool.  without the hardlink pooling rsync would only have to see each file once.


------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/
<Prev in Thread] Current Thread [Next in Thread>