BackupPC-users

Re: [BackupPC-users] 8.030.000, Too much files to backup ?

2011-12-16 09:02:28
Subject: Re: [BackupPC-users] 8.030.000, Too much files to backup ?
From: Tim Fletcher <tim AT night-shade.org DOT uk>
To: "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Fri, 16 Dec 2011 14:01:03 +0000
On Fri, 2011-12-16 at 07:33 -0600, Les Mikesell wrote:
> On Fri, Dec 16, 2011 at 4:49 AM, Jean Spirat <jean.spirat AT squirk DOT org> 
> wrote:

> > for my understanding  rsync had allways seems to be the most efficient
> > of the two but i never challenged this "fact" ;p
> 
> Rsync working natively is very efficient, but think about what it has
> to do in your case.   It will have to read the entire file across nfs
> just so rsync can compere contents and decide not to copy the content
> that already exists in your  backup.
> 
> >  i will have a look at tar and see if i can work with it .
> 
> I'd try rsync over ssh first, at least if most of the files do not
> change between runs.   If you don't have enough ram to hold the
> directory listing or if there are changes to a large number of files
> per run, tar might be faster.

The real issue with rsync is the memory usage for the 8 million entries
in the file list. This is because the first thing that happens is rsync
walks the tree comparing with already backuped up files to see if the
date stamp has changed. This puts memory and disk load on both the
backup server and the backed up client. The approach that tar uses is
just to walk the directory tree and transfer everything newer than a
timestamp that backuppc passes to it. 

This costs some extra network bandwidth but massively reduces the disk
and memory bandwidth needed on both the backuppc client and server.

The server that I am backing up with ~7 million files takes on the order
of 6000 minutes to backup with rsync, the bulk of that time is taken up
by rsync building the tree of files to transfer. The same server takes
about 2500 minutes with tar because of the simpler way of finding files.

Overall rsync makes better backups because it finds moved and deleted
files and is far far more efficient with network bandwidth, but if you
understand the draw backs and need the filesystem efficiency of tar then
it is still an excellent backup tool.

-- 
Tim Fletcher <tim AT night-shade.org DOT uk>


------------------------------------------------------------------------------
Learn Windows Azure Live!  Tuesday, Dec 13, 2011
Microsoft is holding a special Learn Windows Azure training event for 
developers. It will provide a great way to learn Windows Azure and what it 
provides. You can attend the event by watching it streamed LIVE online.  
Learn more at http://p.sf.net/sfu/ms-windowsazure
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/