BackupPC-users

Re: [BackupPC-users] Backing up many, many files over a medium speed link - kills CPU and fails halfway. Any help?

2009-11-24 22:43:46
Subject: Re: [BackupPC-users] Backing up many, many files over a medium speed link - kills CPU and fails halfway. Any help?
From: Chris Bennett <chris AT ceegeebee DOT com>
To: GB <pseudo AT gmail DOT com>
Date: Wed, 25 Nov 2009 14:11:26 +1030
Hi,

> Thanks for the reply. The data is, in fact, "all time" in the sense that it
> goes back years, but it's sorted by filename, rather than date; it's
> essentially equivalent to how BackupPC stores data in cpool/, i.e. the first
> 3 characters of the filename will generate 3 levels of subdirectories. The
> best I was able to do, to date, was to make 10 shares, 1-9, and back up 10
> separate backup trees. But that was before, when I had about 100k files... I
> tried this recently, and seem to have made it go under. So I guess I'd need
> to make TWO levels of shares, so 1/0-1/9, 2/0-2/9, etc. Then, maybe, once I
> go through the full loop, it'll be easier to perform future incrementals
> since the delta will be small.

Yeah, I've been able to archive large pools of files that have aged,
so that backuppc doesn't have to consider such a large filelist.  I'm
not too sure on the mechanics of backuppc and overhead - e.g. what
amount of work does backuppc perform to perform a full and
incremental.. how much memory is consumed per considered file.  I
expect someone else can more succintly answer these kind of questions
to help you build a more scalable configuration.

> My BackupPC box doesn't swap too much, it doesn't behave like it's under
> massive load at all; but then again, I think my IO subsystem (Dell Perc6 +
> 4x WD Greens in RAID5) hopefully outperforms the speed of the link+any
> overhead :) I haven't tried stracing rsync on the remote server. Any
> suggestions on how to use it? I've never tried it before.

Get the pid of your rsync process on the source of data.

Then perform something like 
  # -s3000 specified 3000 characters printed per system call
  strace -p <pid> -s3000

This will give you insight into the open/stat/read/close cycle that
rsync will be doing when copying data.  I would expect it to be
cycling faster than you can read, although in the case where I've seen
high swap activity, you'll see batches of the cycle followed by
pauses.

Similarly, running:
  vmstat 1

in another console and looking at the bi/bo columsn that represent
blocks in/out helps you to know whether swap is being heavily used.

Good luck and let me know if you find a good solution to your problem.

Regards,

Chris Bennett
cgb

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/