BackupPC-users

Re: [BackupPC-users] Setting up a new BackupPC server

2009-09-17 01:44:07
Subject: Re: [BackupPC-users] Setting up a new BackupPC server
From: Stephen Vaughan <stephenvaughan AT gmail DOT com>
To: "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Thu, 17 Sep 2009 15:39:28 +1000
We're building a new backuppc server at the moment aswell, the box is using 8x 300gb SAS 10k drives in raid 10, the decision of whether to use raid5/6 or raid10 is difficult. At the moment raid10 with 1.1TB gives us 12-18 months before we reach our capacity, however with raid5 we have 2100gb (roughly) of storage, and with raid5 I'm fairly confident that the server will handle backing up our current storage, which is around 550-600gig, however we're a hosting provider so our storage usage is growing everyday, we could be having to backup double that amount in 12 months time, so the question is, will raid5 then be fast enough to cope with backing up that amount of data, or do we need raid10?

I'm going to test backing up our current storage on both raid5 and raid10, just as a comparison, and then we'll make the decision based on performance whether to go raid5/6 or raid10.

On Thu, Sep 17, 2009 at 9:55 AM, dan <dandenson AT gmail DOT com> wrote:
> I would argue that file system stability is paramount for backups and
> performance is really a distant second.  who cares how fast the system

If you can't perform your backups in a 24-hour period, performance
becomes more important.

if you backup data just to lose it, performance is never even a factor.  Performance needs can be measured.  I need to get this much done in this amount of time.  reliability is a need.  a 99% reliability is the equivalent of saying its ok to lose data 1 out of every 100 days.  So performance is almost completely irrelevant if you dont have reliable storage.  of coarse it makes sense to identify your data safety need and then buy/build to perform as well as possible for your workload while achieving your reliability needs.  I personally think through some experimenting of my own that JFS is not an ideal filesystem on *linux*.  So even if JFS is very fast, I dont trust it with my data so it wont get used.  I would chose XFS if it were second place because it is reliable.  stability and security first, then performance in this specific environment.

On my desktop I use ext4 and formerly used reiserfs.  ext4 is your and reiserfs doesnt like to get killed hard but I keep backups of my data and the loss of the system is a minor inconvenience while my backup server MUST be stable and I wouldnt dare use ext4(yet) or reiserfs on it.  ext3 may be slow but it is bulletproof.
 
> will be substantially slower than raid10 or raid1 that have no parity
> calculation.

He is using a hardware RAID card.
 By hardware raid I really mean dedicated processor raid controller.  I dont know the 3ware model and dont know its capabilities.  3ware makes some cards that are little more than software raid with a physical interface.  Im thinking higher end Adaptec or LSI that has a processor that does that instead of handing it off to the cpu.


> That being said, if raid5(or6) is fast enough for you it is a mature and
> stable option and a good choice, but certainly comes with a performance
> penalty.

You make it sounds like RAID-5 is incapable of saturating drive
bandwidth.  I haven't seen this on any modern (2-yr-old or newer)
machine with more than one CPU.  And he's using hardware RAID so the
point is moot anyway.
sure raid5 can saturate the bus.  specifically with backuppc latency is the issue, not bandwidth.  the latency added by the parity calculation negatively effects IO performance.  SSD drives are the best example.  they can have lower, sometimes much lower, bandwidth than a HD but the drastically lower latency translates directly to massively improved IO.  backuppc lives on hardlinks which are tiny writes where bandwidth is of little importance and IO performance owns the day.  bandwidth is important, but backuppc really likes IO performance.

 

> examples:
> raid10 with 6 drives in a raid0(r1-1+2, r1-3+4, r1-5+6), is 3 active
> spindles because the other three are mirrors but has a worst case safety
> of just 1 drive
> raid10 in raid0(raid1-1+2+3, raid1-4+5+6) is just 2 spindles but is more
> resilient because you can loose 2 or more drives and keep the array up.

I think you have it backwards.  A stripe of three mirrors (your first
example) means that you could lose up to three drives and still have
data (as long as you're not losing both drives in a mirror).

by worst case, you could lose just one drive but best case 3 drives.  if you lose drive 1 and 2 the array is dead.
 
> These are round numbers, kind of a rule of thumb.  6 volumes is about
> where raid5 actually catches up.  with a 4 drive set the raid5 penalty
> brings is to 2 active spindles and has a large 33% latency penalty
> because the array has to wait for all writes to complete while a raid10
> is 2 active spindles without a latency hit.

I think you're overestimating the performance penalty of latency.  This
is worked around in most systems with caching (including a large amount
of RAM on hardware controller cards), and only for writes.
I would agree with you on a lot of workloads, but not backuppc.  creating hardlinks at the level of backuppc is very IO intensive.  For bandwidth intensive servers or desktops the raid5(or better 6) is likely very close in overall speed and offers more storage for the money/drive count.  backuppc is a very unusual workload because of the hardlinks and that is why latency is so much more important than in the vast majority of other workloads.



------------------------------------------------------------------------------
Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
http://p.sf.net/sfu/devconf
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/




--
Best Regards,
Stephen
------------------------------------------------------------------------------
Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
http://p.sf.net/sfu/devconf
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/