BackupPC-users

Re: [BackupPC-users] Linux backups with rsync vs tar

2011-09-03 20:33:48
Subject: Re: [BackupPC-users] Linux backups with rsync vs tar
From: Holger Parplies <wbppc AT parplies DOT de>
To: "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Sun, 4 Sep 2011 02:31:48 +0200
Hi,

Timothy J Massey wrote on 2011-09-02 10:43:37 -0400 [Re: [BackupPC-users] Linux 
backups with rsync vs tar]:
> charlesboyo <backuppc-forum AT backupcentral DOT com> wrote on 08/31/2011 
> 05:53:43 AM:
> [...]
> > Thus I have reason to suspect the rsync overhead as being guilty.

for the record, I've just (finally!) switched from tar to rsync for a data
server, and this significantly increased run times of backups. Incremental
backups are taking about three times as long as they used to (if memory
serves correctly). So, yes, rsync does have a significant overhead. That is
not surprising.

I should add that in my case the server is backing up local file systems (to
an iSCSI disk). The effect should be less significant if client and server are
not the same machine (though it is a quad-core and the disk sets are
independent).

> > Note that I have disabled hard links,

???
What is that supposed to mean? You removed the rsync "-H" option?
When you're talking about the pool, "disabling hard links" sounds rather
troubling ;-).

> > implemented checksum caching, 
> > increased the block size to 512 KB and enable --whole-file to no avail.

I don't think File::RsyncP supports changing block size (and probably not
--whole-file either).

> > 1. since over 90% of the files change every day and "incremental" 
> > backups involve transferring the whole file to the BackupPC server, 
> > won't it make better sense to just run a full backup everyday?

There is probably not much difference either way.

> [...]
> You may find that trading CPU performance for network performance may not 
> be a good trade in your case.

That is true, but there are other reasons for using rsync rather than tar.
Exactness of backups. tar incrementals don't reflect deleted files, for
instance. Though, in *this* specific case that may not make much difference
(presuming I'm correct in assuming your mbox files are never deleted or
renamed, extracted from tar-/zip-files, etc., or at least that missing such
a change until the next full backup is unproblematic).

> The number one question I have is:  is this really a problem?  If you have 
> a backup window that allows this, I would not worry about it.  If you do 
> *not*, then rsync might not be for you.

That's exactly the point. In my case, it is *not* a problem, so I prefer more
accurate backups, even if the fulls take all of the night. Thank you for
reminding me to shift the full run to the weekend, which I'll do right now :-).

> 2) Les' point about the format of the files (one monolithic file for each 
> mailbox vs. one file per e-mail) is dead on.  That allows 99% of the files 
> to remain untouched once they're backed up *once*.  That will *vastly* 
> reduce the backup times.

... and pool storage requirements, and rsync will handle small files much
better. Sadly enough, there is still enough braindead software around that
doesn't support maildir format, even in the Unix world. Open Source probably
means that I should start hacking the $#@%volution sources ...

> > 2. from Pavel's questions, he observed that BackupPC is unable to 
> > recover from interrupted tar transfer. Such interruptions simply 
> > cannot happen in my case. Should I switch to tar?

Your situation is completely different from his - you're on a local network,
aren't you? You don't need the bandwidth reductions you gain from rsync - tar
should work fine for you.
*But* you should consider whether (incremental) tar backups will be
sufficiently accurate. Since you are transferring almost everything anyway,
you could even run only full tar backups.

For the sake of completeness, I should mention that full backups always
rebuild the entire tree (in BackupPC storage). In the general case, this
can raise storage requirements (for directory entries and duplicates due to
exceeding HardLinkMax), but in your case I wouldn't expect much difference.
However, with tar, backup exactness would benefit.

> > And in the 
> > unlikely event that the transferred does get interrupted, what 
> > mechanisms do I need to implement to resume/recover from the failure?
> 
> To repeat another response:  restart the backup...

To extend on that: BackupPC does that automatically at the next wakeup, so you
don't really need to do anything (except from having a reasonable
WakeupSchedule).

> > 3. What is the recommended process for switching from rsync to tar -

Change $Conf{XferMethod} to 'tar' :-). Add/rename the other settings as
needed.

> > since the format/attributes are reportedly incompatible?

They're only slightly different. The only "problem" is that when switching
*from tar to rsync* (which you're not doing), rsync will re-transfer
everything, because it appears to have changed from <unknown file type> to
<plain file>. The only time this is really important is when you import a
local clone of a remote file system via tar XferMethod, meaning to save you
the bandwidth of the first full rsync. That won't work (without patching the
attrib files ;-). Pooling is just fine, though. And tar doesn't base any
decisions on the attrib file contents, just on the time stamp (for incremental
backups; full backups transfer everything anyway).

Actually, I was aware of this and had intended to force my first backup after
changing from tar to rsync to be a full. Of course, I forgot. Later on, I was
reminded by the fact that my log files showed the next few backups (up to the
regular scheduled full) to have taken extremely long (they had all transferred
everything apparently changed since the previous full backup - a tar backup).
In the mean time, the next full backup (rsync) had taken care of things for me
and incremental backup times were back to normal (for a new definition of
"normal", though).

I'm not sure what a direct rsync restore of a backup done with tar would do,
but, again, that's not your case (nor is it mine, because I don't do direct
restores).

> > I would like to preserve existing compressed backups as much as possible.

You shouldn't notice any difference.

> I do not believe that files transferred by rsync will pool 
> with files transferred by tar (due to the attribute issue you mention); 

They would if the content would match. The *attribute files* wouldn't pool
even with matching content. However, if anything in the directory changes
(like a timestamp or size of a file ...), then the attrib file will differ
anyway.

> As an aside, BackupPC (well, the pooling) buys you virtually *nothing* in 
> your application.  With a fast enough network connection, rsync buys 
> everyone almost *nothing*, too.  You are using two tools that have very 
> distinct advantages, but you're using them in an environment that largely 
> ignores their advantages.

Don't forget the exactness aspect, though. tar incrementals have one single
timestamp to go by, rsync incrementals have full file lists of both sides to
compare. That buys "everyone" better incremental backups :).

> [...] Every single one of my backup servers is based on BackupPC, and all
> but maybe 2 shares are backed up using rsync. [...]
> I vastly prefer consistency over performance.  But I can live with 8 hour
> backup windows.

To put it differently, the point of backups is not to be fast, it's to secure
data. The quality of a backup is not influenced by its speed (if it is, you
should be using snapshots), so you can't sensibly trade accuracy for speed.
Of course, that is only as long as the backup completes within the backup
window.

> That's the fun of being the Administrator! :)

Exactly :).

Regards,
Holger

------------------------------------------------------------------------------
Special Offer -- Download ArcSight Logger for FREE!
Finally, a world-class log management solution at an even better 
price-free! And you'll get a free "Love Thy Logs" t-shirt when you
download Logger. Secure your free ArcSight Logger TODAY!
http://p.sf.net/sfu/arcsisghtdev2dev
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

<Prev in Thread] Current Thread [Next in Thread>