BackupPC-users

Re: [BackupPC-users] backuppc slow rsync speeds

2012-09-17 14:06:35
Subject: Re: [BackupPC-users] backuppc slow rsync speeds
From: John Rouillard <rouilj-backuppc AT renesys DOT com>
To: "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Mon, 17 Sep 2012 18:05:28 +0000
On Mon, Sep 17, 2012 at 12:54:35PM -0400, Timothy J Massey wrote:
> No matter the size of the system, I seem to top out at about 50GB/hour for 
> full backups.  Here is a perfectly typical example:
> 
> Full Backup:  769.3 minutes for 675677.3MB of data.  That works out to be 
> 878MB/min, or about 15MB/s.  For a system with an array that can move 
> 200MB/s, and a network system that can move at least 70MB/s.

My last full backup of a 2559463.2 MB backup ran 306.9 minutes. Which
if I am doing my math right is 138MB/s. This was overlapping in i/o
with 9 other backups in various backup stages.

My backup drive is a 1U linux box exporting its disks as an isci
system running raid 5 iirc over gig-E. LSI hardware raid with bbu
write cache I think.
 
> The server side, though, shows something completely different.  Here is a 
> few lines from dstat:
> 
> ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
> usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw
>  33   2  64   1   0   0|  22M   47k|   0     0 |   0     0 |1711   402
>  43   3  49   6   0   0|  40M  188k|  35k 1504B|   0     0 |2253   632
>  45   4  49   1   0   1|  50M   36k|  38k 1056B|   0     0 |2660   909
>  46   4  50   0   0   0|  46M    0 |  55k 1754B|   0     0 |2540   622
>  45   4  50   1   0   0|  45M   12k| 120B  314B|   0     0 |2494   708
>  43   3  50   3   0   0|  42M    0 |  77k 1584B|   0     0 |2613   958
>  41   4  47   8   0   0|  50M  268k| 449B  356B|   0     0 |2333   704
>  46   3  50   1   0   0|  42M   36k|  26k 1122B|   0     0 |2583   771
>  45   4  50   1   0   0|  40M    0 |  30k  726B|   0     0 |2499   681
> 
> It looks like everything is under-utilized.  For example, I'm getting a 
> measly 40-50MB of read performance from my array of four drives, and 
> *nothing* is going out over the network.

4 drive JBOD/raid0, raid 1/0, raid 5, raid 6? I'll assume raid 5.

> My physical drive and network 
> lights echo this:  they are *not* busy.  My interrupts are certainly 
> manageable and context switches are very low.  Even my CPU numbers look 
> tremendous:  nearly no time in wait, and about 50% CPU idle!
> 
> Notice what top shows us:
> 
> top - 13:21:27 up 49 min,  1 user,  load average: 2.07, 1.85, 1.67
> Tasks: 167 total,   2 running, 165 sleeping,   0 stopped,   0 zombie
> Cpu(s): 43.7%us,  3.6%sy,  0.0%ni, 50.5%id,  2.1%wa,  0.0%hi,  0.1%si, 
> 0.0%st
> Mem:   3924444k total,  3774644k used,   149800k free,     9640k buffers
> Swap:        0k total,        0k used,        0k free,  3239600k cached
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  1731 backuppc  20   0  357m 209m 1192 R 95.1  5.5  35:58.08 BackupPC_dump
>  1679 backuppc  20   0  360m 211m 1596 D 92.1  5.5  32:54.18 BackupPC_dump
> 
> 
> My load average is 2, and you can see those two processes:  two instances 
> of BackupPC_dump.  *Each* of them are using 100% of the CPU given to them, 
> but they're both using the *same* CPU (core), which is why I have 50% 
> idle!

Can you check that with the f J(IIRC) option. I don't see the P column
in there that would tell us what cpu they are running on.

Also I thought (but may be wrong) that %cpu was time spent on a
cpu/total number of seconds in the interval. So if they really were on
a single cpu, they couldn't report more than 50% in the %CPU column.
I agree that the overall %CPU number of 50% on a dual core system is
calcuated as:

   cumulative cpu seconds for every running process for sample period/sample 
period * # of processors

AFAIK backuppc is not threaded. So the two backuppc processes you see
there are talking to each other via IPC and should be running on
different cores/processors (although linux's scheduler does leave
something to be desired in keeping processes on the same core as I
have seen two processes swap cores for no discernable reason).

 > Mark Coetser, can you see what top shows for the CPU utilization for your 
> system while doing a backup?  Don't just look at the single "idle" or 
> "user" numbers:  look at each BackupPC process as well, and let us know 
> what they are--and how many physical (and hyper-threaded) cores you have. 
> Additional info can be found in /proc/cpuinfo if you don't know the 
> answers.

Also type f J to add the P (processor column).
 
> To everyone:  is there a way to get Perl to allow each of these items to 
> run on *different* processes?  From my quick Google it seems that the 
> processes must be forked using Perl modules designed for this purpose.  At 
> the moment, this is beyond my capability.  Am I missing an easier way to 
> do this?

As I said above I would expect them to be on different cores.  There
is a scheduling wackyness in linux that can cause similar sort of
bottlenecking for large data sets. Can you check to see what your
kernel variable: vm.zone_reclaim_mode is set to? (Feel free to google
for examples of issues with mysql and postgresql.)

Also when you are seeing these backuppc processes pegged on the cpus,
can you strace and see what they are doing?

I have always assumed the following sequence of events:

   1 connect to client and send the paths to be backed up.

   2) client side calculate basic data about the files (last
      datestamp,size etc)

   3) this is sent to the server and the server grinds through
      the tree looking for the corresponding data from the reference
      backup. This is bascially a bunch of data reads from disk that
      are essentially random as it walks the filetree and gathers
      metadata info etc. It becomes even more random if there is
      another backup going on (or a link from a prior backup etc.)

   4) backuppc starts requesting data for particular files
      and the block read algorithm kicks in.
      Again this may be random read data if another backup is going
      on. If you have cached checksums, this may be a single read (to
      just grab the block checksum list which is smaller than the
      file), but for any reasonable sized file it is a number of
      block reads as it reads through the file.

   5) data gets transfered and written to the NEW directory if a file
      doesn't match the copy in the pool. Again this has a tendency to
      make the data random since it moves the head, reads a block,
      moves heads writes block, reads next block etc... (although
      kernel/disk/filesystem readahead should help a lot here.)

So in step 2 the server does nothing. In step 3 you may be cpu bound
as I expect there to be little data compared to step 4. In step 4/5 I
expect you would oscillate between i/o and cpu bound as you have to
get the data and decompress it since you can't use precached checksums
or would be compressing the data to write it to disk.

> And one more request:  for those of you out there using rsync, can you 
> give me some examples where you are getting faster numbers?  Let's say, 
> full backups of 100GB hosts in roughly 30-35 minutes, or 500GB hosts in 
> two or three hours?  That's about four times faster than what I'm seeing, 
> and would work out to be 50-60MB/s, which seems like a much more realistic 
> speed.  If you are seeing such speed, can you give us an idea of your 
> hardware configuration, as well as an idea of the CPU utilization you're 
> seeing during the backups?  Also, are you using compression or checksum 
> caching?  If you need help collecting this info, I'd be happy to help you.

Primary backuppc server at the moment is a server providing file
storage services: 24 core 48G memory (1600 MHz). But I had much the
same numbers running on a 4 core, 32 GB (or maybe 16) machine with an
attached SCSI disk array raid 6 with 14 or 15 spindles (Dell
MD1000). Disk is isci using ext4 noatime.

Backuppc is configured with cached checksums.

Local clients (we also back up systems over the WAN) are connected
over Gig-E.

When you see high cpu can you use strace to see what you have for i/o
reads/writes? Also have you tried setting this up with a
non-compressed pool as an experiment to see if compression is what is
killing you.

-- 
                                -- rouilj

John Rouillard       System Administrator
Renesys Corporation  603-244-9084 (cell)  603-643-9300 x 111

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/