BackupPC-users

Re: [BackupPC-users] How big is your backup?

2009-12-31 23:06:12
Subject: Re: [BackupPC-users] How big is your backup?
From: dan <dandenson AT gmail DOT com>
To: "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Thu, 31 Dec 2009 21:04:14 -0700
On Thu, Dec 31, 2009 at 8:36 AM, Peter Vratny <usenet2006 AT server1 DOT at> wrote:
mark k wrote:
> Agreed sas drives are the way to go, just built a backup server with
> 10 300gb sas running in a raid 50, going to hopefully replace 2 backup
> servers that were using SATA storage.

This is just a question of price. We are currently running 3
Backup-Servers, which is way cheaper than building one with SAS drives.


I agree.  SAS drives at 15,000rpms is very nice but VERY expensive.  Servers and disks rarely scale perfectly with more Mhz, more RAM, or more disks.  There is a point of diminishing returns where you should get a second server in my opinion.

I think we are nearing the need for a major change in how this data is managed.  Disks are not getting faster at the rate that data is growing.  Networks are also not able to keep up with the every growing need to store data.   In-line deduplication is looking pretty promissing.  As a way to save space obviously but even more to reduce I/O.

with ZFS supporting deduplication on *solaris platforms and Chris Mason planning deduplication in BTRfs for linux ( https://ldn.linuxfoundation.org/blog-entry/a-conversation-with-chris-mason-btrfs-next-generation-file-system-linux ) this could really be a solution to today's backup needs.

with block level, in-line deduplication in the filesystem consider the following. 
when a file is transfered, the data would only be written if the block was uniq.  if the block was the same as another in the filesystem then a pointer to that block would be written rather than the whole block.  With disk caching this can radically reduce IO when a file is very similar to another file on the filesystem.  This is all done essentially with hashes of each block.  These would be very large hash tables but would be sorted and indexed to quick lookup as the tables would be just a hash and a pointer and the hash is of a known length making indexing pretty quick. Some fancy algorythms could be used to identify when a file has a very low deduplication rate and the dedupe process could be skipped and the file tagged for dedupe later.

------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/