BackupPC-users

Re: [BackupPC-users] How does BackupPC work?

2009-02-18 12:50:29
Subject: Re: [BackupPC-users] How does BackupPC work?
From: Adam Goryachev <mailinglists AT websitemanagers.com DOT au>
To: "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Thu, 19 Feb 2009 04:48:19 +1100
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Tino Schwarze wrote:
> Hi John,
> 
> On Wed, Feb 18, 2009 at 10:58:14AM -0600, John Goerzen wrote:
> 
>> I've been reading docs on BackupPC and I have a few questions about
>> how it works.
>>
>> First off, I gather that it keeps a hardlinked pool of data, so
>> whenever a file changes on any host, on the backup device, it will be
>> hardlinked to a file containing the same data, regardless of the host
>> it came from, right?
>  
> Right.

Mostly right...

If you have a file with identical content stored on two different hosts
(or even two files on the same host):
host1:/var/log/messages
host1:/var/log/kernel.log
Let's assume these two files get the exact same log data...

They are both backed up onto the server, so each file in full is
transferred to the server, no bandwidth savings (basically)...

The next day, both files have changed, but the two new files are identical.
The first file is copied to a new file in the backup dir, and rsync
transfers only the changed data.
The second file is copied to a new file in the backup dir, and rsync
transfers only the changed data.
After the backup completes, backuppc runs through all the new files, and
creates a hardlink between the first file and the pool. When it sees the
second file, it will delete it from the backup dir, and create a
hardlink to the version in the pool.

The same applies if the two files were on different hosts. If the host
or path is different, then the changed data will be transferred multiple
times (or entire content for new files).

Worst case is when someone manages to copy their photo library or
something on a remote host...

>> So, given that, I don't really understand why there is a distinction
>> between a full and an incremental backup.  Shouldn't either one take
>> up the same amount of space?  That is, if you've got few changes on
>> the client, then on the server you're mostly just hardlinking things
>> anyway, right?  So why is there a choice?
>  
> The only difference between incremental and full (for rsync!) is that
> 1) all files are completely checksummed, so you detect pool curruption
> 2) you get the whole directory structore for the server (which is used
> as the base for incremental backups) with all hardlinks to pool files
> 
> For an incremental, you only get the directory structure and hardlinks
> to new/modified files to the pool.

Maybe not (1) since there is an option CSumVerify or something, which is
set to 0.01 by default (checks 1% of pool files) each time.
Basically, incremental uses less disk IO, CPU, and memory on both client
and server, because it doesn't examine the files on the client in as
much detail (just size, path, modification date/time) instead of
checksum as well.

Regards,
Adam
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkmcSeMACgkQGyoxogrTyiVwTwCfZS5vCvoyEgaiwQoW4hGipCgZ
0q0AnRVlccbJqXnXsPnbghDmMsj34jXC
=OvXr
-----END PGP SIGNATURE-----

------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/