BackupPC-users

Re: [BackupPC-users] Slow transfer via rsync?

2015-09-28 16:05:20
Subject: Re: [BackupPC-users] Slow transfer via rsync?
From: Christian Völker <chrischan AT knebb DOT de>
To: backuppc-users AT lists.sourceforge DOT net
Date: Mon, 28 Sep 2015 22:02:26 +0200
Hi Holger& all,

>> [...]
>> Why is it still so slow?
> it's been mentioned before, though not explicitly. What is the RTT over the
> VPN (i.e. between BackupPC server and client host)?

So here are several outputs:
============================================================================
[root@bu ~]# cat /etc/redhat-release
CentOS Linux release 7.1.1503 (Core)
[root@bu ~]# uname -r
3.10.0-229.14.1.el7.x86_64
[root@bu ~]# rsync --version
rsync  version 3.0.9  protocol version 30
============================================================================
[root@bu ~]# ps ax |grep infra2
21811 ?        S      1:41 /usr/bin/perl
/usr/share/BackupPC/bin/BackupPC_dump infra2.domain.de
21814 ?        S      0:01 /usr/bin/ssh -l backup infra2.domain.de
/usr/bin/sudo /usr/bin/rsync --server --sender --numeric-ids --perms
--owner --group -D --links --hard-links --times --block-size=2048
--recursive -x --checksum-seed=32761 --ignore-times . /srv/
21835 ?        S      0:16 /usr/bin/perl
/usr/share/BackupPC/bin/BackupPC_dump infra2.domain.de
============================================================================
traceroute to 192.168.1.3 (192.168.1.3), 30 hops max, 60 byte packets
 1  10.12.0.1 (10.12.0.1)  344.955 ms  355.805 ms  355.818 ms
 2  infra2 (192.168.1.3)  359.722 ms  359.737 ms  364.983 ms
============================================================================
PING 192.168.1.3 (192.168.1.3) 56(84) bytes of data.
64 bytes from 192.168.1.3: icmp_seq=1 ttl=63 time=256 ms
64 bytes from 192.168.1.3: icmp_seq=2 ttl=63 time=345 ms
64 bytes from 192.168.1.3: icmp_seq=3 ttl=63 time=374 ms
64 bytes from 192.168.1.3: icmp_seq=4 ttl=63 time=424 ms
============================================================================
top on the server while a backup to this client is running
%Cpu(s): 1,8 us,  0,6 sy,  0,0 ni, 95,6 id,  0,9 wa,  0,0 hi,  0,1 si, 
0,0 st
KiB Mem : 24515984 total,   189184 free,  1219728 used, 23107072 buff/cache
KiB Swap: 12582908 total, 12257544 free,   325364 used. 23017216 avail Mem
============================================================================
tcpdump -i tun0 host 192.168.1.3
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on tun0, link-type RAW (Raw IP), capture size 65535 bytes
21:25:46.084584 IP infra2.ssh > bu.54020: Flags [.], seq
3849499977:3849501317, ack 3346469010, win 3755, options [nop,nop,TS val
844038020 ecr 275484056], length 1340
21:25:46.084604 IP bu.54020 > infra2.ssh: Flags [.], ack 4294965956, win
450, options [nop,nop,TS val 275484429 ecr 844037551,nop,nop,sack 1
{0:1340}], length 0
21:25:46.433176 IP infra2.ssh > bu.54020: Flags [.], seq 4294965956:0,
ack 1, win 3755, options [nop,nop,TS val 844038396 ecr 275484429],
length 1340
21:25:46.433206 IP bu.54020 > infra2.ssh: Flags [.], ack 1340, win 441,
options [nop,nop,TS val 275484778 ecr 844038396], length 0
21:25:46.735195 IP infra2.ssh > bu.54020: Flags [.], seq 1340:2680, ack
1, win 3755, options [nop,nop,TS val 844038742 ecr 275484778], length 1340
21:25:46.735216 IP bu.54020 > infra2.ssh: Flags [.], ack 2680, win 450,
options [nop,nop,TS val 275485080 ecr 844038742], length 0
21:25:47.025229 IP infra2.ssh > bu.54020: Flags [.], seq 2680:4020, ack
1, win 3755, options [nop,nop,TS val 844039046 ecr 275485080], length 1340
21:25:47.025249 IP bu.54020 > infra2.ssh: Flags [.], ack 4020, win 450,
options [nop,nop,TS val 275485370 ecr 844039046], length 0
21:25:47.051452 IP infra2.ssh > bu.54020: Flags [.], seq 4020:5360, ack
1, win 3755, options [nop,nop,TS val 844039046 ecr 275485080], length 1340
21:25:47.051471 IP bu.54020 > infra2.ssh: Flags [.], ack 5360, win 450,
options [nop,nop,TS val 275485396 ecr 844039046], length 0
21:25:47.361984 IP infra2.ssh > bu.54020: Flags [.], seq 5360:6700, ack
1, win 3755, options [nop,nop,TS val 844039337 ecr 275485370], length 1340
21:25:47.362017 IP bu.54020 > infra2.ssh: Flags [.], ack 6700, win 450,
options [nop,nop,TS val 275485706 ecr 844039337], length 0
21:25:47.390806 IP infra2.ssh > bu.54020: Flags [.], seq 6700:8040, ack
1, win 3755, options [nop,nop,TS val 844039337 ecr 275485370], length 1340
21:25:47.390826 IP bu.54020 > infra2.ssh: Flags [.], ack 8040, win 450,
options [nop,nop,TS val 275485735 ecr 844039337], length 0
21:25:47.420871 IP infra2.ssh > bu.54020: Flags [.], seq 8040:9380, ack
1, win 3755, options [nop,nop,TS val 844039361 ecr 275485396], length 1340
21:25:47.420890 IP bu.54020 > infra2.ssh: Flags [.], ack 9380, win 450,
options [nop,nop,TS val 275485765 ecr 844039361], length 0
^C
16 packets captured
16 packets received by filter
0 packets dropped by kernel
============================================================================
I am monitoring the server through SNMP and I can see the tun0 interface
is used by 120kb/s inbound and 4k/s outbound. Outbound spiked to 130kb/s
for the first 15minutes of the backup to this client. Besides of the
backup there is nothing else going on on the connection as you can see
on the tcpdump output.


Now the client side:
============================================================================
[root@infra2 ]# cat /etc/redhat-release
CentOS release 5.11 (Final)
[root@infra2 ]# uname -r
2.6.18-400.1.1.el5
[root@infra2 ~]# rsync --version
rsync  version 3.0.6  protocol version 30
============================================================================
ps ax |grep rsync
 8909 ?        Ss     1:12 /usr/bin/rsync --server --sender
--numeric-ids --perms --owner --group -D --links --hard-links --times
--block-size=2048 --one-file-system --recursive --checksum-seed=32761
--ignore-times . /srv/
15059 ?        Ss     0:00 /usr/bin/sudo /usr/bin/rsync --server
--sender --numeric-ids --perms --owner --group -D --links --hard-links
--times --block-size=2048 --recursive -x --checksum-seed=32761
--ignore-times . /srv/
15083 ?        S      0:47 /usr/bin/rsync --server --sender
--numeric-ids --perms --owner --group -D --links --hard-links --times
--block-size=2048 --recursive -x --checksum-seed=32761 --ignore-times .
/srv/
============================================================================
[root@infra2 ~]# traceroute 10.128.0.6
traceroute to 10.12.0.6 (10.12.0.6), 30 hops max, 40 byte packets
 1  router2.domain.de (192.168.1.254)  0.143 ms  0.174 ms  0.159 ms
 2  10.12.0.6 (10.12.0.6)  346.234 ms  346.222 ms  356.531 ms
============================================================================
Cpu(s):  0.3%us,  0.0%sy,  0.0%ni, 99.0%id,  0.7%wa,  0.0%hi,  0.0%si, 
0.0%st
Mem:   2075016k total,  2022548k used,    52468k free,    55160k buffers
Swap:   524280k total,       96k used,   524184k free,  1848188k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
15057 backup    15   0 20300 9976 1032 S  0.3  0.5   0:01.85 sshd
15083 root      19   0  8476 4000  672 S  0.3  0.2   0:47.95 rsync
15448 root      15   0  2476 1060  816 R  0.3  0.1   0:00.01 top
    1 root      15   0  2172  616  536 S  0.0  0.0   0:03.52 init
============================================================================

> How many files are we talking about?
Well, there are some small files, but nothing too small.  ls -lR| wc
gives 47,453 files count for the 27GB of data.

>  Is the firewall blocking (any) ICMP traffic between the two endpoints? 
No. As you can see in the above commands it appears everything if fine.

> A share "/srv" suggests this probably isn't
> Windoze ;-). Is there anything special about this client host (kernel version,
> distribution release, ...)? 
Server is a freshly set up CentOS 7/64 while the client is a Centos 5.11
(see above).

> Can you use tcpdump to look at the characteristics
> of the connection (e.g. is there a constant (slow) exchange, or are there
> hangs, timeouts, retransmissions)?
See above- there is an exchange but I am unsure how to read the output
of tcpdump in detail.
> Is your OpenVPN connection over TCP or UDP? Tunnelling TCP over TCP might give
> such problems, depending on the characteristics of the underlying network
> (e.g. significant packet loss/congestion).
>
UDP for OpenVPN:
============================================================================
[root@bu ~]# cat /etc/openvpn/router2-udp-1194-bu-config.conf
dev tun
persist-tun
persist-key
cipher AES-128-CBC
auth SHA1
tls-client
client
resolv-retry infinite
remote ip.v4.add.ress 1194 udp
lport 0
verify-x509-name "client.domain.de" name
pkcs12 router2-udp-1194-bu.p12
tls-auth router2-udp-1194-bu-tls.key 1
ns-cert-type server
#comp-lzo adaptive
============================================================================

> All of that said, the *first* backup of a host is usually special in that all
> data not already in the pool (which might in this case mean all data) needs to
> be compressed. 
Prior to be compressed it should have been transferred. Does BackuPC
compress "on-the-fly" or after all data is transferred?  However- CPU
cycles are close to zero (see above).

> The *second* backup may add rsync checksums. All of this might
> show up as CPU usage, I/O waits, etc., but it might also just add some more
> latency to the network latency. 
See above. No such usage.

> If your first backup has been running for 24
> hours, that is merely twice the time a quick best case estimate gives. I'd be
> annoyed, but I'd give it some more time before starting to worry.
I wouldn't worry if I at least one of the parameters (IO, CPU, memory,
network) would show some usage. But none of them really does.

>  Granted, the
> failing backups on the old host *are* something to worry about. ClientTimeout?
On the old BackupPC server: $Conf{ClientTimeout} = 720000;
New set up server:
[root@bu ~]#  cd /etc/BackupPC/
[root@bu BackupPC]#  grep -i clienttimeout config.pl
$Conf{ClientTimeout} = 72000;
        ClientTimeout             => 1,

> Also, the network usage you are seeing is way too low and seems to indicate
> that you will reach the point when worrying is warranted ;-).
That is what I meant: All is idling, there are no peaks. Why is it not
using at least on ressource?
Backup still hasn't finished yet....

Any ideas?

Christian

------------------------------------------------------------------------------
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/