Re: [BackupPC-users] backuppc slow rsync speeds
2012-09-17 12:59:48
Mark Coetser <mark AT tux-edo.co DOT za> wrote on 09/17/2012
03:08:49 AM:
> Hi
>
> backuppc
3.1.0-9.1
> rsync
3.0.7-2
>
> OK I have a fairly decent spec backup server with 2 gigabit e1000
nics
> bonned together and running in bond mode 0 all working 100%. If I
run
> plain rsync between the backup server and a backup client both connected
> on gigabit lan I can get sync speeds of +/- 300mbit/s but using backuppc
> and rsync the max speed I get is 20mbit and the backup is taking
> forever. Currently I have a full backup thats been running for 3461:23
> minutes where as the normal rsync would have taken a few hours to
complete.
>
> The data is users maildirs and its about 2.6Tb and I am not using
rsync
> over ssh, I have the rsync daemon running on the client and have setup
> the .pl as follows.
I have several very similar configurations. Here's
an example:
Atom D510 (1.66GHz x 2 Cores)
4GB RAM
CentOS 6 64-bit
4 x 2TB Seagate SATA drives in RAID-6 configuration
I get almost 200
MB/s transfer rate from this array...
2 x Intel e1000 NICs in bonded mode.
In the past, the biggest server I backed up was around
1TB. Personally, I prefer to keep each server image under 1TB if
I can help it. Everything is easiser that way: not just file-level
backups with BackupPC but image level as well, and there's less downtime
(or less time with noticaeable slowdown if it is up) when having to take
such images.
With servers <1TB, rsync-based BackupPC full backups
are slow, but get done in a reasonable amount of time: 8-12 hours,
and I can live with that. It is usually kind of beneficial: if
I start a backup in the middle of the day it does not hammer the client
I'm backing up noticeably. (Lemons, lemonade... :) )
However, I have recently inherited a server that is
>3TB big, and 97% full, too! Backups of that system take 3.5 *days*
to complete. I *can't* live with that. I need better performance.
I was going to write a very similar e-mail to what
you wrote as well! So maybe we can work this together.
All of your configuration looks pretty straightforward to me (except the
mounts: I'm not sure why you have them if you're using rsyncd). Mine
are quite similar.
No matter the size of the system, I seem to top out
at about 50GB/hour for full backups. Here is a perfectly typical
example:
Full Backup: 769.3 minutes for 675677.3MB of
data. That works out to be 878MB/min, or about 15MB/s. For
a system with an array that can move 200MB/s, and a network system that
can move at least 70MB/s.
Now, let's look at the "big" server:
Full backup: 5502.8 minutes for 2434613.6MB
of data. That's even worse: 442MB/min. And 5502.8 minutes is
three and a half *DAYS*.
First, a quick look at the client will show that we
can eliminate it completely. I have checked the performance of several
of them while a backup is running. The client is not CPU or I/O or
memory bound whatsoever. Here is a typical example: a Windows
Server 2008. Task Manager shows minimal everything: between
0% and 20% CPU usage (with most time below 5%), and more than 1GB of 2GB
RAM free (with 1300MB of cached memory). Network utilization is absolutely
flatlined! A quick sanity check of the server's physical drive lights
show that the drive activity is in brief fits and starts. This system
is *clearly* not being taxed. By the way, this contrasts to the beginning
of the backup, when rsync is building the file list. The rsync daemon's
CPU usage bounces around with peaks over 70%, and the drives are blinking
constantly during this process--so the server is perfectly capable of doing
something when it's asked to!
The server side, though, shows something completely
different. Here is a few lines from dstat:
----total-cpu-usage---- -dsk/total- -net/total- ---paging--
---system--
usr sys idl wai hiq siq| read writ| recv send|
in out | int csw
33 2 64 1 0
0| 22M 47k| 0 0 | 0
0 |1711 402
43 3 49 6 0
0| 40M 188k| 35k 1504B| 0 0 |2253
632
45 4 49 1 0
1| 50M 36k| 38k 1056B| 0 0 |2660
909
46 4 50 0 0
0| 46M 0 | 55k 1754B| 0 0
|2540 622
45 4 50 1 0
0| 45M 12k| 120B 314B| 0 0 |2494
708
43 3 50 3 0
0| 42M 0 | 77k 1584B| 0 0
|2613 958
41 4 47 8 0
0| 50M 268k| 449B 356B| 0 0 |2333
704
46 3 50 1 0
0| 42M 36k| 26k 1122B| 0 0 |2583
771
45 4 50 1 0
0| 40M 0 | 30k 726B| 0
0 |2499 681
It looks like everything is under-utilized. For
example, I'm getting a measly 40-50MB of read performance from my array
of four drives, and *nothing* is going out over the network. My physical
drive and network lights echo this: they are *not* busy. My
interrupts are certainly manageable and context switches are very low.
Even my CPU numbers look tremendous: nearly no time in wait,
and about 50% CPU idle!
Ah, but there's a problem with that. This is a dual-core system.
Any time you see a dual-core system that is stuck at 50% CPU utilization,
you can bet big that you have a single process that is using 100% of the
CPU of a single core, and the other core is sitting there idle. That's
exactly what's happening here.
Notice what top shows us:
top - 13:21:27 up 49 min, 1 user, load
average: 2.07, 1.85, 1.67
Tasks: 167 total, 2 running, 165 sleeping,
0 stopped, 0 zombie
Cpu(s): 43.7%us, 3.6%sy, 0.0%ni, 50.5%id,
2.1%wa, 0.0%hi, 0.1%si, 0.0%st
Mem: 3924444k total, 3774644k used,
149800k free, 9640k buffers
Swap: 0k total,
0k used, 0k free, 3239600k
cached
PID USER PR NI VIRT
RES SHR S %CPU %MEM TIME+ COMMAND
1731 backuppc 20 0 357m 209m
1192 R 95.1 5.5 35:58.08 BackupPC_dump
1679 backuppc 20 0 360m 211m
1596 D 92.1 5.5 32:54.18 BackupPC_dump
My load average is 2, and you can see those two processes:
two instances of BackupPC_dump. *Each* of them are using 100%
of the CPU given to them, but they're both using the *same* CPU (core),
which is why I have 50% idle!
Mark Coetser, can you see what top shows for the CPU
utilization for your system while doing a backup? Don't just look
at the single "idle" or "user" numbers: look
at each BackupPC process as well, and let us know what they are--and how
many physical (and hyper-threaded) cores you have. Additional info
can be found in /proc/cpuinfo if you don't know the answers.
To everyone: is there a way to get Perl to allow
each of these items to run on *different* processes? From my quick
Google it seems that the processes must be forked using Perl modules designed
for this purpose. At the moment, this is beyond my capability. Am
I missing an easier way to do this?
And one more request: for those of you out there
using rsync, can you give me some examples where you are getting faster
numbers? Let's say, full backups of 100GB hosts in roughly 30-35
minutes, or 500GB hosts in two or three hours? That's about four
times faster than what I'm seeing, and would work out to be 50-60MB/s,
which seems like a much more realistic speed. If you are seeing such
speed, can you give us an idea of your hardware configuration, as well
as an idea of the CPU utilization you're seeing during the backups? Also,
are you using compression or checksum caching? If you need help collecting
this info, I'd be happy to help you.
To cover a couple of other frequently suggested items,
here's what I've examined to improve this:
Yes, I have noatime. From fstab: UUID=<snipped>
/data ext4 defaults,noatime 1
2
Noatime only makes a difference when you are I/O bound--which
ideally a BackupPC server would be. In my case, it made very little
difference. I'm not I/O bound.
I am using EXT4. I have gotten very similar
performance with EXT3. Have not tried XFS or JFS, but would *really*
prefer to keep my backups on the extremely well-known and supported EXT
series.
I am using compression on this BackupPC server. Obviously,
this may contribute to the CPU consumption. My old servers did not
have compression, but had terrible VIA C3 single-core processors. And
their backup performance was quite similar. I figured with the Atom
D510 I'd be OK with compression. But maybe not. I'll try to
see if I can do some testing with some smaller hosts without compression
and see what happens.
As for checksum caching: As I mentioned, I think
the strength of leaving it off is very valuable. But I look forward
to seeing the performance others are getting and how they compare to see
at what performance cost this protection is coming.
Thank you very much for your help!
Timothy J. Massey
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/
|
|
|