BackupPC-users

[BackupPC-users] tar vs. rsync benchmark and limitation test results

2011-02-24 22:36:30
Subject: [BackupPC-users] tar vs. rsync benchmark and limitation test results
From: John Goerzen <jgoerzen AT complete DOT org>
To: backuppc-users AT lists.sourceforge DOT net
Date: Thu, 24 Feb 2011 21:34:54 -0600
As you have have seen, I have recently observed that rsync performance 
in backuppc is surprisingly bad in two cases: one involving very large 
files, and the other involving incrementals of level > 1.

I have been avoiding tar for general-purpose use due to the warning at 
[1], which comments that files extracted from archives won't be backed 
up since they will have timestamps in the past.  From reading the GNU 
tar manual, however, it appears that --newer, which is used by BackupPC, 
inspects mtime and ctime and shouldn't have that problem.  (It may still 
have the problem of not noticing deletions or renames until the next 
full backup, but this isn't a data loss scenario, and is fairly common 
among backup systems so I am used to dealing with it.)

I, therefore, did some benchmarking of BackupPC data with rsync and tar, 
and also did some testing to validate whether the limitations listed 
were valid.  I'll summarize what I found here.

TEST SETUP
----------

The tests were run with compression level 3 on a Core 2 Duo 6420 running 
64-bit Debian.  I backed up /usr and /etc/backuppc.  /usr contained 
21,356 directories and 205,342 files representing 7.1GB of data.  The 
disk was a generic SATA workstation disk with ext4.  The benchmarks were 
entirely self-contained within the machine to eliminate any impact of 
ssh encryption or network traffic.  No disk-based encryption was used. 
Read and write caches were flushed between each run.

No content changed on /usr between these tests.  A config.pl file or two 
changed in /etc/backuppc but that was it.  If you see increasing times 
for incremental backups, it's due to BackupPC, not to changing source 
data.  I observed that first full backups can take different amounts of 
time than subsequent ones, so made a point of running multiple 
successive backups.

/var/lib/backuppc was completely wiped between the rsync and the tar tests.

RSYNC RESULTS
-------------

Initial full backup: 20.2 minutes
Next full backup   : 24.0 minutes
Incremental level 1: 2.9 minutes
...  (ran several level 1s to test)
Incremental level 1: 4.5 minutes
Incremental level 1: 4.8 minutes

At this point, I enabled rsync checksum caching and ran some more backups.

Full backup        : 32.5 minutes
Full backup        : 22.7 minutes
Full backup        : 22.6 minutes
Incremental level 1: 4.5 minutes
..
Incremental level 4: 6.6 minutes
Incremental level 5: 9.4 minutes

TAR RESULTS
-----------

Initial full backup: 16.6 minutes
Incremental level 1: 2.2 minutes
...
Incremental level 4: 2.4 minutes
Incremental level 5: 2.2 minutes
Full backup        : 25.9 minutes

TAR LIMITATION TESTING
----------------------

After performing the benchmarks, I created a directory /usr/local/test. 
  In that directory, I created root and jgoerzen directories.  Into each 
of those, I untar'd and unzipped example archive files containing files 
added in 2008 or before.  I did this unpacking once as root and again as 
my usual user account.  I then ran an incremental backup with tar.

According to the limitations page, the unpacked files should not have 
been backed up.  However, they were properly detected and backed up as 
they should have been, which is good.

Next, I used mv to rename a file to a different name in the same 
directory and then ran another incremental.

In this case, BackupPC noticed the file with the new name and backed it 
up.  It did not notice that the old file had gone away, which is 
somewhat as expected.  ls -lc confirmed that mv changed the ctime on the 
file.

ANALYSIS
--------

The problem that prompted this was incrementals taking very long on slow 
disks with rsync.  My data here shows that a level 5 incremental takes 
more than twice as long as a level 1 with rsync.  Although the 
difference here was measured in minutes, if the level 1 is measured in 
hours, then the difference is also measured in hours.

Somewhat surprising was that rsync checksum caching provided only a 
marginal benefit (a reduction from 24.0 to 22.6 minutes for a full 
backup).  It is possible that the data set in question here (vast 
numbers of small files) is not good and displaying the benefit of 
checksum caching.

The initial full backup with tar was 18% faster than with rsync, but 
after checksum caching was enabled, subsequent fulls were 14% slower -- 
the only big surprise in this to me.

More to the point, incrementals displayed little variation between runs 
with tar, while they continually grew longer and longer with rsync with 
each subsequent run.

The files in this test case did not demonstrate the other pathological 
problem with BackupPC's rsync algorithm, that of taking 10+ hours to 
back up a changed 25GB file.  Had such a file been involved, the tar 
backup would certainly have been many orders of magnitude faster than rsync.

RECOMMENDATIONS
---------------

For backups across a LAN, it looks like:

  * tar permits an overall lower execution time, since there is no 
performance penalty for an incremental list such as [1, 2, 3, 4, 5, 6, 
7, 8, 9, 10, 11, 12, 13].  With rsync, this becomes unbearably slow and 
more frequent full backups will be required.

  * A downside for using tar is that deletions will not be detected 
until the next full backup run.

  * rsync fulls with checksum caching enabled may sometimes be faster 
than tar fulls, but rsync fulls and incrementals will still likely be 
very much slower if very large files with changes are involved.

  * The rsync backend for BackupPC is probably not useful unless 
Internet backups or small backup sets to fast disks are involved.

  * The "limitations" of the tar backend have been exaggerated, at least 
for backing up Linux systems with using POSIX-obeying filesystems with 
GNU tar.  (vfat under Linux may still exhibit the limitations 
documented, for instance.)

One other point to make is that a long-standing bug in the CGI does not 
permit one to restore from a host backed up with tar to one backed up 
with rsync, which I did observe in testing. [2]

What do you all think?  Does this all make sense?

Does it point to any issues in BackupPC that are easily fixable?

-- John

[1] 
http://backuppc.sourceforge.net/faq/limitations.html#incremental_backups_might_not_be_accurate

[2] http://www.adsm.org/lists/html/BackupPC-users/2010-06/msg00070.html

------------------------------------------------------------------------------
Free Software Download: Index, Search & Analyze Logs and other IT data in 
Real-Time with Splunk. Collect, index and harness all the fast moving IT data 
generated by your applications, servers and devices whether physical, virtual
or in the cloud. Deliver compliance at lower cost and gain new business 
insights. http://p.sf.net/sfu/splunk-dev2dev 
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

<Prev in Thread] Current Thread [Next in Thread>