BackupPC-users

Re: [BackupPC-users] serial or parallel backups

2011-06-26 17:35:11
Subject: Re: [BackupPC-users] serial or parallel backups
From: John Rouillard <rouilj-backuppc AT renesys DOT com>
To: "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Sun, 26 Jun 2011 21:33:42 +0000
On Sun, Jun 26, 2011 at 03:48:20PM -0500, Chris Baker wrote:
> I have been wondering about this for a while. Am I better off having
> backups run parallel or in series?
> 
> By running in series, I mean one backup runs at a time. When it finishes,
> another one starts.
> 
> By running parallel, I mean that several backups run at once. It seems
> that when backups have to fight over bandwidth, they all end up running
> much more slowly. I have it set up to run four backups at once.
> 
> A server that rarely runs more than one back has achieved throughput as
> high as 24.13 MB/sec. However, the server with four backups has a maximum
> of only 5.71 MB/sec. Bottom line, the four when added up still don't get
> as good a throughput as the single backup.
> 
> What does everyone here think?

It depends on a few things. I use rsync almost exclusively. So YMMV
with other backup mechanisms. 

What is your i/o subsystem? I have a striped array (raid 0) over two
raid 6 arrays with 7 drives in each array, so effectively I have 10
spindles. With this I can handle more i/o load than if I only had one
drive.

I would expect the throughput of multiple servers backing up in
parallel to be less the the total i/o bandwidth of the disk since each
write will end up moving the heads of the disks to different locations
dropping the effective bandwidth of the disk.

However, if you have a raid controller with a battery backed cache you
may find that you can actually get more bandwidth up to the point
where your backups have saturated the cache at which point the backups
are waiting for the disk heads to move and write data. Also if you
have the memory increasing the readahead helps as well as you can do a
sequential read of multiple blocks of a file when the first block is
requested so that multiple blocks are cached in memory and an rsync
read of the next block doesn't have to wait for another i/o cycle to
disk and head movement to fullfil the request.

In my case I also have a lot of systems backing up across the wan that
are bandwidth limited to 64 K bytes/sec. So even if I only had one
backup disk, I could handle more than 1 backup running in parallel
easily.

Also the backup process itself goes through both read and write
cycles. If you are doing an incremental and have the inode info cached
in memory you have effectively 0 bandwidth use on the server while the
client is furiously scanning its disks looking for new data. This
allows you to have another backup writing new data to your backup disk
without any contention.

So running multiple backups can let you consume i/o bandwidth that
would otherwise be wasted while processing is primarily happening on
the client (note there is no way to actually control this).

If you always do full backups (requiring a full disk read for all
files with rsync) and your backups are always starting at exactly the
same time and stay synchronized you will have a very different
performance curve from a mixed set of incrementals and fulls where
they are staggered in their i/o pattern over the span of the backup
window.

When I first set up backuppc I ran some speed tests varying:

  amount of input bandwidth (# of clients and a mix of clients using
    different network bandwidth limits 64KB -> 5MB/s)

  type of back end (raid 6, raid 1/0, raid 0 (for testing only)

and I tried to minimize the total amount of backup time needed per
test set of of hosts (I had 40 hosts total). I found that raid 0
across the disks was the fastest (no surprise there) and I was able to
handle 20 or so of the higher bandwidth hosts running in
parallel. Most likely because the backups started at the same time,
but finished at different time when new backups were started. So the
mix of running backups vs read/write i/o load would shift as the
backups got done.

Currently I run 10 backups in parallel and have 132 hosts being backed
up nightly and I only have one or two still running in the morning
when we have a heavy data churn (> 1/2 TB).

My pool reports: 1258 full backups of total size 61013.99GB.
So I have a pretty small average full backup size.

-- 
                                -- rouilj

John Rouillard       System Administrator
Renesys Corporation  603-244-9084 (cell)  603-643-9300 x 111

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/