BackupPC-users

Re: [BackupPC-users] first copy on slow line

2014-09-16 14:55:02
Subject: Re: [BackupPC-users] first copy on slow line
From: Holger Parplies <wbppc AT parplies DOT de>
To: xavier.crespin AT xlmedia DOT fr, "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Tue, 16 Sep 2014 20:26:43 +0200
Hi,

Xavier Crespin wrote on 2014-09-15 22:36:41 +0200 [Re: [BackupPC-users] first 
copy on slow line]:
> [...]
> Anyone correct me if i'm wrong...

ok, here goes :).

> Le 2014-09-15 22:27, Evaristo Calatravita a écrit :
> > I'm testing to backup various 2TB-filesystems 
> > with relativelly lower daily changes (about 10-30Mb)
> > 
> > The problem is that the line between these hosts and backuppc server is 
> > terribly slow :-(, but I have the possibility of make initial backup 
> > moving info in a harddisk or similar.

That means you will be using rsync(d). You will at some point be annoyed that
full backups take very long, even though they don't transfer much data. That
is because both ends need to read all of the data and compute checksums to
make sure everything is still ok at both ends and to catch changes that
incremental backups might in rare circumstances may have missed.

You can speed this up by enabling checksum caching (search config.pl for
"checksum" for details). Note that the first two full backups will *not*
benefit from this setting. Don't expect a speedup before the third full
backup.

> > The question is: there is some function or hack to copy first backup 
> > manually to the backuppc server?

Function: no, hack: yes. There was a wiki page describing this in some detail
back in the days when there was a wiki, but SourceForge seems to be replacing
the wiki software faster than we are filling (or even migrating) it ...

If you are going to use *rsyncd*, things are particularly simple: set up an
rsyncd module with the same name and configuration as on the real target, but
point the path at wherever you mount your hard disk with the initial copy.
In the BackupPC configuration, set $Conf {ClientNameAlias} to the name of the
local machine (or "localhost" if it's the BackupPC server itself). Do the
initial backup. Remove $Conf {ClientNameAlias} (or change its value as needed)
and everything should be ready to backup the remote client.

For *rsync*, things are more complicated. In a nutshell, you need to keep the
value of RsyncShareName *exactly* as it will be later on, i.e. you need to
mount the copy of your remote file system at the same place locally. If that
is not possible, you might experiment with chroot. If you're really desperate,
you could try using tar XferMethod for the initial backup with a command
crafted to use the share name you need to hit the location where your copy is
(i.e. something like 'tar -c -v -f - -C /my/local/mount/point/$shareName ...'),
but you might have to patch the BackupPC attrib files for this to work as you
want (there was a bug in some versions of BackupPC causing retransmission of
unchanged files after a switch from tar to rsync XferMethod; I'm not sure
if/when it was fixed).
Your best course of action is to mount the local copy in the correct place.

As a rule of thumb, the first backup of the *remote* host should be a *full
backup*, both to make sure things are working as expected, and to ensure you
have a sensible reference backup for future incrementals.

Think about your backup schedule. With, say, monthly full backups and daily
level 1 incrementals, you will have growing deltas transferred each day
(perhaps 290-870 MB on day 29, considering your 10-30 MB of changes per day),
so that might not work. Look at IncrLevels, but don't overdo it (1..30
probably won't work either). The amount of transferred data is *minimal* for
alternating full and incremental backups, but you might not be able to do that
frequent fulls of 2 TB of data. You'll need to find a compromise that fits
your needs.

> the problem you will face is similar to the one i face right now : even 
> if you populate the pool from a fast line, when you will start the first 
> backups on your client host, backuppc will download complete files to 
> compute MD5's and compare it to files in the pool, except maybe if all 
> attributes are absolutely identical (haven't tried that).

What you are describing here is a pre-populated pool (eliminating the need for
compressing the data) without a matching reference backup for the same host,
i.e. you did not backup the data under the same BackupPC host name, or you
changed relevant parts of the backup configuration (*ShareName), or you
switched from tar/smb to rsync XferMethod as described above. Or you are not
using rsync(d) in the first place ;-).

Aside from that,
a) BackupPC doesn't compute the pool file name from the complete contents,
   but rather from (parts of) the first 1 MB and the length of the file.
b) Identical attributes will only make a difference for rsync *incrementals*
   in deciding which files to *look at*. With a reference file in place,
   only checksums will be transferred, not the complete file content. Without
   a reference file in place, the notion "identical attributes" is
   meaningless (identical to what?).

> Apparently BPC V4 will have a feature (Rsync checksum caching) wich will 
> compute checksum on the client side, this feature is already available 
> in Alpha.

You are mixing two things up here. rsync checksum caching is a mechanism
available in BackupPC 3.x (and 2.x :) for speeding up full backups.

BackupPC 4.x uses full file MD5 digests for the pool, which will apparently
let BackupPC use a pool file as reference if a match (i.e. candidate for a
file with *exactly identical* content) is found. This would, indeed, avoid
transferring any files already in the pool.
The *client side* is native rsync, though. It *already* calculates checksums.
That is not a feature of BackupPC ;-). The feature of BackupPC 4.x is that
these checksums are now usable for avoiding transfers of known data.

> But the best thing, if you have the opportunity, would be to try on a 
> test setup, well, at least it wouldn't kill.

The important thing to note when testing is that backup duration is *not
necessarily* proportional to the amount of transferred data. Full backups can
take a long time even though almost no data goes over the wire. Be sure to
measure the correct value!

Regards,
Holger

------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce.
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

<Prev in Thread] Current Thread [Next in Thread>