BackupPC-users

Re: [BackupPC-users] Newbie setup questions

2011-03-11 12:30:04
Subject: Re: [BackupPC-users] Newbie setup questions
From: Cesar Kawar <kawarmc AT gmail DOT com>
To: "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Fri, 11 Mar 2011 18:27:34 +0100
El 11/03/2011, a las 14:59, Jeffrey J. Kosowsky escribió:

> Cesar Kawar wrote at about 10:08:10 +0100 on Friday, March 11, 2011:
>> 
>> 
>> El 11/03/2011, a las 08:04, hansbkk AT gmail DOT com escribió:
>> 
>>> On Fri, Mar 11, 2011 at 10:56 AM, Rob Poe <rob AT poeweb DOT com> wrote:
>>>> I'm using RSYNC to do backups of 2 BPC servers.  It works swimmingly, you 
>>>> plug the USB drive into the BPC server, it auto-mounts, emails that it's 
>>>> starting, does an RSYNC dump (with delete), flushes the buffers, dismounts 
>>>> and emails.
>>> 
>>> Sounds great Rob, would you be willing to post the script?
>>> 
>>> Rsync'ing is all fine and good until your hardlinked filesystem (I
>>> don't know the proper term for it, as opposed to the pool") gets "too
>>> big". It's a RAM issue, and an unavoidable consequence of rsync's
>>> architecture - I'm not faulting rsync mind you the kind of filesystem
>>> that BPC (and rdiff/rsnapshot etc) build over time is a pretty extreme
>>> outlier case.
>>> 
>> That is not a problem anymore with latests versions of rsync. I've been 
>> using this technique for a year now with a cpool of almost 1Tb with no 
>> problems.
>> 
>> Don't expect it to run on a celeron machine as it requieres big processors. 
>> Rsyncing 1Tb of compressed hardlinked data to a new filesystem is a very cpu 
>> intensive task. But it does not leak memory as before. You can relay on 
>> rsync to mantain a usb disk for off-site bakups.
> 
> I think rsync uses little if any cpu -- after all, it doesn't do much
> other than do delta file comparisons and some md4/md5
> checksums. All much more rate-limited by network bandwidth and disk
> i/o.

Not at all. essentially, rsync was designed exactly for the opposite goals of 
the ones you mentioned. rsync is bandwidth friendly, but it is very cpu 
expensive. The amount of memory needed is much less important than the cpu 
needed. Again, from rsync FAQ page:

        "Rsync needs about 100 bytes to store all the relevant information for 
one file, so (for example) a run with 800,000 files would consume about 80M of 
memory. -H and --delete increase the memory usage further."

My firefox requires about double of that memory just to open www.google.com
I know that is "only" to process 800,000 files, but with version 3.0.0 and 
later, it doesn't load all the files at once. With a 512 Mb computer you'll be 
fine, but in the particular installation I was talking before, 1 Tb of data 
comprised of 1 year of historical data (that means a really big number of 
hardlink per file), the syncing process takes almost 100% CPU on an Intel Xeon 
Quad Core for about 2 hours. 

rsync is a really cpu expensive process. You can always use caching for md5 
chesums process, but, I wouldn't recommend that on an off-site replicated 
backup. Caching introduces a small probability of loosing data, and that 
technique is already used when doing a normal BackupPC backup with rsync 
transfer, so, if you then resync that data to another drive, disk of filesystem 
of any kind, your probability of loosing data is a power of the original one.

Not recomended I think.  I prefer to expend a little more money on the machine 
once and not have surprises later on when the big boss ask you to recover his 
files....

I don't have graphs, but the amount of memory available to any recent computer 
is more than enough for rsync. Disk I/O is somewhat important, and disk 
bandwidth is a constraint, but, cpu speed is the more important thing in my 
tests.


> 
> I was under the impression that the slowdown, is due to the need to
> build (and check) lists of hardlinks which is memory
> constrained. Maybe when the list gets really long, cpu power is needed
> to build/sort/lookup the list but I would think that if rsync were
> written well, that this again would not be the rate limiting issue.
> 
> Would be interesting for someone to graph performance vs. amount of
> memory and vs. cpu power/speed.
> 
> ------------------------------------------------------------------------------
> Colocation vs. Managed Hosting
> A question and answer guide to determining the best fit
> for your organization - today and in the future.
> http://p.sf.net/sfu/internap-sfd2d
> _______________________________________________
> BackupPC-users mailing list
> BackupPC-users AT lists.sourceforge DOT net
> List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
> Wiki:    http://backuppc.wiki.sourceforge.net
> Project: http://backuppc.sourceforge.net/


------------------------------------------------------------------------------
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/