BackupPC-users

Re: [BackupPC-users] Why is my rsync so much slower to do an incremental backup than tar over ssh?

2009-07-15 10:25:19
Subject: Re: [BackupPC-users] Why is my rsync so much slower to do an incremental backup than tar over ssh?
From: Holger Parplies <wbppc AT parplies DOT de>
To: Carl Wilhelm Soderstrom <chrome AT real-time DOT com>
Date: Wed, 15 Jul 2009 16:21:19 +0200
Hi,

Carl Wilhelm Soderstrom wrote on 2009-07-15 08:10:19 -0500 [Re: 
[BackupPC-users] Why is my rsync so much slower to do an incremental backup 
than tar over ssh?]:
> On 07/15 08:13 , gimili wrote:
> > Why is my rsync so much slower to do an incremental backup than tar?
> 
> Because rsync makes checksums of all the files, instead of just checking the
> timestamp.

that is actually not true for incremental backups, unless you have changed
RsyncArgs to include --ignore-times (which you should *not*).

> > Have I made an error?
> 
> No, tar really is faster than rsync in some cases.

While that is true, incremental backups should not be taking 5 hours (compared
to a 7 hour full backup). Something is going wrong, I just couldn't guess
what, and I'm not even sure what additional information to ask for, aside from
the usual: relevant config files settings (everything related to rsync or tar,
including BackupFilesOnly/BackupFilesExclude), details on your setup that you
missed (network, type of file system on both sides, number of files in backup
set ...). One other thought: what does your server status page say on the
matter of hash chains and their length?

> > Are there any serious problems with using tar?
> 
> - tar does not catch files which have changed, but have timestamps which say
> they are not changed. If the timestamp gets set to some point prior to the
> last reference backup, the file won't be backed up.

This includes, most notably, files that have been moved/renamed, especially if
they were moved from a place that was not backed up to one that is (because
then they will not even be in the backup under a wrong name). Other common
examples are unpacked zip file contents with old dates.

Also, incremental tar backups cannot detect deleted files (while incremental
rsync backups will), so these will continue to exist in your backups up to the
next full backup, meaning the incrementals don't acurately reflect the state of
your file system. This may or may not be important for you.

> If tar works better for you in your environment, due to network bandwidth,
> processor power available, memory availability, etc; that's why it's an
> option. :)

Thankfully, Jon Forrest recently reminded us that you can [probably] get the
best of both worlds by using the '-W' (or '--whole-file') option to rsync.
This makes rsync transfer the whole file (like tar) instead of attempting to
speed up the transfer with the (expensive) rsync algorithm. If network
bandwidth is *not* your limiting factor, you should be able to cut down CPU
usage (and possibly disk I/O, presuming rsync needs to read the files twice -
once for calculating checksums and once for retrieving content to transfer)
and still get the benefits of rsync's file list comparison. I'm just not
positive that File::RsyncP implements this option, but I'm going to test it as
soon as I find some time.

> As a noteworthy data point, when making an initial copy of files (not using
> backuppc, just plain tar or rsync); tar is 2x-4x faster than rsync,
> presumably due to all of rsync's calculating overhead.

I find this somewhat surprising, because an initial copy has no remote files
to compare to. Also, I'm quite sure that this depends on your data set (file
sizes, file counts), computer configurations (extremely slow sender vs. fast
receiver?) and network (high bandwidth). But I suppose tar *can be* 2-4 times
faster than rsync *in some cases*.

> Rsync wins when making subsequent copies that don't require a large
> percentage of the data to be transferred.

That is really the point of using rsync (note: this, again, depends on other
factors, probably mainly network speed). One thing to note in the context of
BackupPC: if you switch XferMethod from tar to rsync, you won't get any
benefits until *after the first full rsync backup*, because due to a difference
in attrib file formats the rsync method *won't* match same files from the
reference tar backup. So you can't do an initial full tar backup for the speed
advantage (if there really is one) and then switch to rsync.

Another BackupPC specific note is that rsync Xfer will save interrupted full
backups as partials and restart them later (saving you the network transfers
that have already been done), while tar will always need to rerun the full
backup from the start, because it has no notion of resuming an interrupted
transfer.

So, to sum it up, there are numerous advantages to rsync as XferMethod,
especially if network bandwidth is a precious resource (compared to CPU cycles
and disk I/O), and rsync should normally not be as slow (by far!) as you are
experiencing it. If tar incrementals take 10 minutes, rsync incrementals
should probably be somewhere in the range of 10-30 minutes, with a typical
value below 15 minutes (I'm just guessing, but that's what I'd expect). It's
just a matter of figuring out why your rsync incrementals are taking so long.

Your relevant XferLOG file might give some clues. Unchanged files should not
be mentioned for incremental backups and listed as "same" for full (rsync!)
backups. Directories are always listed. Sizes of XferLOG files should clearly
indicate which backups were full and which were incremental (meaning your
XferLogLevel should be at 1 ;-). Can you find anything obvious in there? (*)

Regards,
Holger

(*) sudo -v; sudo /usr/share/backuppc/bin/BackupPC_zcat 
/var/lib/backuppc/pc/hostname/XferLOG.X.z | less
    (replace the paths to match your installation).

------------------------------------------------------------------------------
Enter the BlackBerry Developer Challenge  
This is your chance to win up to $100,000 in prizes! For a limited time, 
vendors submitting new applications to BlackBerry App World(TM) will have
the opportunity to enter the BlackBerry Developer Challenge. See full prize  
details at: http://p.sf.net/sfu/Challenge
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/