BackupPC-users

Re: [BackupPC-users] Rsynv vs. tar, full vs. incremental

2011-08-26 06:49:14
Subject: Re: [BackupPC-users] Rsynv vs. tar, full vs. incremental
From: Pavel Hofman <pavel.hofman AT ivitera DOT com>
To: Holger Parplies <wbppc AT parplies DOT de>
Date: Fri, 26 Aug 2011 12:46:37 +0200
Dne 31.5.2011 21:57, Holger Parplies napsal(a):
> Hi,
> 
> Pavel Hofman wrote on 2011-05-31 15:24:56 +0200 [[BackupPC-users] Rsynv vs. 
> tar, full vs. incremental]:
>> Incremental backup of a linux machine using tar (i.e. only files newer
>> than...) is several times faster than using rsync.
> 
> that could be because it is missing files that rsync catches. Or perhaps I
> should rather say: yes, tar is probably more efficient, but it is less exact
> than rsync, because it only has one single timestamp to go by, whereas rsync
> has a full file list with attributes for all files. One very real consequence
> is that tar *cannot* detect deleted files in incremental backups while rsync
> will.
> 
> My understanding is that the concept of incremental backups, way back in times
> where we did backups to tapes, was introduced simply to make daily backups
> feasible at all. Something along the lines of "it's not great, but it's the
> best we can do, and it's good enough to be worthwhile".
> 
> Nowadays, "incremental" backups still have their benefits, but we really need
> to shake the habit of making compromises for no better reason than that we
> haven't yet realized that there is an alternative.
> 
> If you determine that incremental tar backups are good enough for you (e.g.
> because the cases it doesn't catch don't happen in your backup set), or that
> your server load forces you to make a compromise, then that's fine. But if
> it's only "tar is faster than rsync and faster is better", then you should
> ask yourself why you are doing backups at all ("no backups" is an even faster
> option).
> 
>> On the other hand, full backup using tar transfers huge amount of data over
>> network, way more than the efficient rsync.
> 
> There are also other factors to consider like CPU usage. Where exactly is your
> bottleneck?
> 
>> Is there a way to use rsync for full backup and tar for the incremental
>> runs?
> 
> No. Actually, *the other way around*, it would make sense: full backups with
> tar (probably faster than rsync over a fast local network - depending on your
> backup set) and incremental backups with rsync (almost as exact as a full
> backup).
> 
>> I do not even know whether the two transfer modes formats produce
>> mutually compatible data in the pool.
> 
> No. There is (or was?) a slight difference in the attribute files, leading to
> retransmission of all files on the first rsync run after a tar run (because
> RsyncP "thinks" the file type has changed from <something> to plain file).
> The rest is, of course, compatible. It would be a shame if pooling wouldn't
> work between tar and rsync backups, wouldn't it? :)

Hi Holger,

Sorry for the few months between my reply :) I have been fighting the
issue and still do not see any solution.

I guess the main problem is tar cannot resume after a network glitch,
while rsync takes too much time and RAM on our servers with a few
million files each (maildirs, development trees etc.)

Perhaps if the network transport was not so sensitive to network
interruptions, TAR would be just fine. Our cable internet is VERY fast
(100Mbps down with no FUP), but there are short interrupts at nights
(mostly up to a minute). This often breaks full TAR backups before they
are able to finish, rendering them useless. Our backups take tens of
hours easily.

Do you have any experience with tuning the network layer, or any other
suggestion? Theoretically, a VPN could help (in fact there is openVPN
active), it would just require running TAR over netcat, no additional
layer of SSH. Otherwise the SSH over SSH overhead would make the process
useless again.

Thanks a lot for suggestions.

Pavel.

------------------------------------------------------------------------------
EMC VNX: the world's simplest storage, starting under $10K
The only unified storage solution that offers unified management 
Up to 160% more powerful than alternatives and 25% more efficient. 
Guaranteed. http://p.sf.net/sfu/emc-vnx-dev2dev
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

<Prev in Thread] Current Thread [Next in Thread>
  • Re: [BackupPC-users] Rsynv vs. tar, full vs. incremental, Pavel Hofman <=