BackupPC-users

Re: [BackupPC-users] BackupPC File::RsyncP issues

2009-08-18 21:59:21
Subject: Re: [BackupPC-users] BackupPC File::RsyncP issues
From: Holger Parplies <wbppc AT parplies DOT de>
To: Jim Leonard <trixter AT oldskool DOT org>
Date: Wed, 19 Aug 2009 02:14:31 +0200
Hi,

Jim Leonard wrote on 2009-08-18 17:00:05 -0500 [[BackupPC-users] BackupPC 
File::RsyncP issues]:
> First off, I'm a happy user of BackupPC; I'm only posting because I have 
> an architecture question resulting in bad performance that I'm hoping 
> someone can answer.
> [...]
> With smb, which used smbclient to do the transfers, I was seeing 
> transfer speeds of 40-65MB/s over a gigabit network -- with rsync-based 
> backups, I am seeing about 6MB/s, ten times slower.

first of all, where are you seeing these figures, and what are you measuring?
The primary purpose of the rsync protocol is to save network bandwidth. So if,
for example, you are transferring only one tenth the amount of data for a full
backup, and that takes the same time as with SMB, your network throughput will
be only one tenth as high. That is not a problem, but rather a feature, and it
indicates that network bandwidth is not, in fact, your bottleneck. There are
other good reasons to use rsync just the same. And, yes, I read your mail in
the other thread, but it's still not obvious what you are actually observing,
and what you are interpreting.

Secondly, what are you comparing? Due to a "feature" of the interpretation of
attrib files by the rsync XferMethod, the first backup (well, all up to the
first full, to be a bit more exact) after switching from non-rsync to rsync
will re-transfer all data (which would make the backup slow, but not
low-bandwidth). In any case, you should run at least one full rsync backup
(per host) before starting measurements.

Have you got very large growing files (or probably: large *changing* files) in
your backup? They could also lead to an explanation (outside File::RsyncP, by
the way).

> I profiled File::RsyncP which is what BackupPC_dump appears to be using, 
> and found this troubling report after a profile time of one day:
> 
> time elapsed (wall):   86034.3727
> time running program:  85959.5328  (99.91%)
> time profiling (est.): 74.7665  (0.09%)
> 
> %Time    Sec.      #calls   sec/call  F  name
> 83.30 71605.7838   913708   0.078368  ?  File::RsyncP::pollChild
> 15.98 13737.1191      261  52.632640     File::RsyncP::writeFlush
>   0.21  176.3028    121432   0.001452     File::RsyncP::getData
> (snip)
> 
> As you can see, pollChild is called a ridiculously large number of 
> times, which is eating up nearly 70% of the CPU time trying to do a 
> backup.

Did you look at the code, or are you inferring that the number is ridiculous
from the name of the function? I don't know enough about the rsync protocol
(yet) to say for sure if the number of calls could be reduced and how, but
the calls to pollChild() seem to make sense to me.

What strikes *me* as unreasonable is the 261 calls to writeFlush() taking an
average of 52.6 seconds. Or maybe there was a wrap-around in the counter?

You should also note that not all of the work is done inside File::RsyncP, so
it's not 70% of the backup time spent there.

Don't get me wrong. I'm not saying that it wouldn't be good to significantly
increase BackupPC performance, if it can be done in the context of how
BackupPC works or can work.

> This is extremely inefficient and completely explains why my 
> backups are taking so long over rsync

Does it? Please share the explanation ...

> So, my questions are:
> 
> - Is there a reason BackupPC needs to emulate rsync through File::RsyncP 
> instead of just using rsync itself?

Yes. Craig wouldn't have gone to the trouble of implementing File::RsyncP for
BackupPC if there wasn't, would he? (You are aware that Craig is also the
author of BackupPC, aren't you? ;-)

How would you propose using rsync to update a compressed deduplicated pool
with a separate directory for each backup, mangled file names and file
attributes stored seperately?

> - If not, is anyone maintaining File::RsyncP who can optimize that code 
> and/or redesign it?

If there is no reason to use it, someone should optimize it? ;-)

I believe Craig is researching other alternatives (a fuse FS to handle
compression and deduplication, so BackupPC could, in fact, use native rsync).
If that proves unviable, upgrading File::RsyncP to protocol version 30 would
probably be next. But File::RsyncP is open source, so you're free to optimize
it yourself :-). If I find any time at all, I'll take a closer look at the
matter, but that's pretty much an "if (0)" ...

Regards,
Holger

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/