As you have have seen, I have recently observed that rsync performance
in backuppc is surprisingly bad in two cases: one involving very large
files, and the other involving incrementals of level > 1.
I have been avoiding tar for general-purpose use due to the warning at
[1], which comments that files extracted from archives won't be backed
up since they will have timestamps in the past. From reading the GNU
tar manual, however, it appears that --newer, which is used by BackupPC,
inspects mtime and ctime and shouldn't have that problem. (It may still
have the problem of not noticing deletions or renames until the next
full backup, but this isn't a data loss scenario, and is fairly common
among backup systems so I am used to dealing with it.)
I, therefore, did some benchmarking of BackupPC data with rsync and tar,
and also did some testing to validate whether the limitations listed
were valid. I'll summarize what I found here.
TEST SETUP
----------
The tests were run with compression level 3 on a Core 2 Duo 6420 running
64-bit Debian. I backed up /usr and /etc/backuppc. /usr contained
21,356 directories and 205,342 files representing 7.1GB of data. The
disk was a generic SATA workstation disk with ext4. The benchmarks were
entirely self-contained within the machine to eliminate any impact of
ssh encryption or network traffic. No disk-based encryption was used.
Read and write caches were flushed between each run.
No content changed on /usr between these tests. A config.pl file or two
changed in /etc/backuppc but that was it. If you see increasing times
for incremental backups, it's due to BackupPC, not to changing source
data. I observed that first full backups can take different amounts of
time than subsequent ones, so made a point of running multiple
successive backups.
/var/lib/backuppc was completely wiped between the rsync and the tar tests.
RSYNC RESULTS
-------------
Initial full backup: 20.2 minutes
Next full backup : 24.0 minutes
Incremental level 1: 2.9 minutes
... (ran several level 1s to test)
Incremental level 1: 4.5 minutes
Incremental level 1: 4.8 minutes
At this point, I enabled rsync checksum caching and ran some more backups.
Full backup : 32.5 minutes
Full backup : 22.7 minutes
Full backup : 22.6 minutes
Incremental level 1: 4.5 minutes
..
Incremental level 4: 6.6 minutes
Incremental level 5: 9.4 minutes
TAR RESULTS
-----------
Initial full backup: 16.6 minutes
Incremental level 1: 2.2 minutes
...
Incremental level 4: 2.4 minutes
Incremental level 5: 2.2 minutes
Full backup : 25.9 minutes
TAR LIMITATION TESTING
----------------------
After performing the benchmarks, I created a directory /usr/local/test.
In that directory, I created root and jgoerzen directories. Into each
of those, I untar'd and unzipped example archive files containing files
added in 2008 or before. I did this unpacking once as root and again as
my usual user account. I then ran an incremental backup with tar.
According to the limitations page, the unpacked files should not have
been backed up. However, they were properly detected and backed up as
they should have been, which is good.
Next, I used mv to rename a file to a different name in the same
directory and then ran another incremental.
In this case, BackupPC noticed the file with the new name and backed it
up. It did not notice that the old file had gone away, which is
somewhat as expected. ls -lc confirmed that mv changed the ctime on the
file.
ANALYSIS
--------
The problem that prompted this was incrementals taking very long on slow
disks with rsync. My data here shows that a level 5 incremental takes
more than twice as long as a level 1 with rsync. Although the
difference here was measured in minutes, if the level 1 is measured in
hours, then the difference is also measured in hours.
Somewhat surprising was that rsync checksum caching provided only a
marginal benefit (a reduction from 24.0 to 22.6 minutes for a full
backup). It is possible that the data set in question here (vast
numbers of small files) is not good and displaying the benefit of
checksum caching.
The initial full backup with tar was 18% faster than with rsync, but
after checksum caching was enabled, subsequent fulls were 14% slower --
the only big surprise in this to me.
More to the point, incrementals displayed little variation between runs
with tar, while they continually grew longer and longer with rsync with
each subsequent run.
The files in this test case did not demonstrate the other pathological
problem with BackupPC's rsync algorithm, that of taking 10+ hours to
back up a changed 25GB file. Had such a file been involved, the tar
backup would certainly have been many orders of magnitude faster than rsync.
RECOMMENDATIONS
---------------
For backups across a LAN, it looks like:
* tar permits an overall lower execution time, since there is no
performance penalty for an incremental list such as [1, 2, 3, 4, 5, 6,
7, 8, 9, 10, 11, 12, 13]. With rsync, this becomes unbearably slow and
more frequent full backups will be required.
* A downside for using tar is that deletions will not be detected
until the next full backup run.
* rsync fulls with checksum caching enabled may sometimes be faster
than tar fulls, but rsync fulls and incrementals will still likely be
very much slower if very large files with changes are involved.
* The rsync backend for BackupPC is probably not useful unless
Internet backups or small backup sets to fast disks are involved.
* The "limitations" of the tar backend have been exaggerated, at least
for backing up Linux systems with using POSIX-obeying filesystems with
GNU tar. (vfat under Linux may still exhibit the limitations
documented, for instance.)
One other point to make is that a long-standing bug in the CGI does not
permit one to restore from a host backed up with tar to one backed up
with rsync, which I did observe in testing. [2]
What do you all think? Does this all make sense?
Does it point to any issues in BackupPC that are easily fixable?
-- John
[1]
http://backuppc.sourceforge.net/faq/limitations.html#incremental_backups_might_not_be_accurate
[2] http://www.adsm.org/lists/html/BackupPC-users/2010-06/msg00070.html
------------------------------------------------------------------------------
Free Software Download: Index, Search & Analyze Logs and other IT data in
Real-Time with Splunk. Collect, index and harness all the fast moving IT data
generated by your applications, servers and devices whether physical, virtual
or in the cloud. Deliver compliance at lower cost and gain new business
insights. http://p.sf.net/sfu/splunk-dev2dev
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/
|