BackupPC-users

Re: [BackupPC-users] Renaming files causes retransfer?

2011-04-17 10:27:23
Subject: Re: [BackupPC-users] Renaming files causes retransfer?
From: John Rouillard <rouilj-backuppc AT renesys DOT com>
To: backuppc users list <backuppc-users AT lists.sourceforge DOT net>
Date: Sun, 17 Apr 2011 14:25:24 +0000
On Sun, Apr 17, 2011 at 09:23:07AM +0200, martin f krafft wrote:
> we are facing a policy change requiring people to rename data files
> in a trivial way (replace ':' with '-').
> 
> In terms of backuppc, this means that the files will have to be
> transferred again, completely, right?

Correct.
 
> Or is there a way in which I can prepare the server for this change
> and prevent the completely unnecessary transfer of terabytes of
> data, just so backuppc can find out that the data haven't changed?

I assume you are using rsync as your backup method. Hence I claim you
can prepare the server, but I mention YMMV, not valid in months with a
full moon or days whose english name ends in y etc. It requires
surgery on your last valid backup to account for the renaming, and may
make your last valid backup invalid for restoration.

I had had this work (I believe since the backup time/bytes transferred
was much less than it would take for it to transfer the files) a few
times. I suggest taking 2 full backups just before the rename. The
first captures any data that has changed and can be used to do
restores. The second is what you are going to operate on.

Let's call the top of your backup pc data dir (where the cpool, pool
and pc directories reside) /backuppc. Lets assume the files are being
renamed on the host client1 in the share /data and the (sub) directory
/set1. You will have a directory:

  /backuppc/pc/client1/<backupnumber>/f%2fdata/fset1

under there each file/directory will be represented as f<filename> or
f<directoryname>. Change into the /backuppc/pc/client1/<backupnumber>
directory where backupnumber is the number of your last full backup.
Navigate to the directory where a file that is going to be renamed in
the last full backup and change its (mangled) name. So

  mv f20110204_11:23.dat f20110204_11-23.dat

for example.

Once you have done the surgery on the pc tree and the renames have
occurred on client1, run another full backup. What should happen is:

  the rsync full backup will use the last backup (i.e. the full you
     operated on) as it's reference backup
  since the file names in the reference backup match the file names
     on client1
  it should do a block comparison rather than transferring the
     file(s) all over again.

Since you moved the file to the new name, it will still be linked into
the pool, if you copy the file that will not be the case (and your
data needs will grow since this surgey won't cause the altered backup
to be linked into the cpool). Rather than move I suppose you could use
ln if you wanted to keep both sets of names in the modified backup.

Note that there is an attrib file in the same directory as your
data. It is a binary file that is needed to restore infomation like
uid/gid/mode ... when the files are restored. Because you renamed the
files, you won't be able to restore them cleanly from that backup tree
since there is no entry in the attrib file for the renamed
file. (Linking should get around this issue, but I haven't tried it.)

When I did this (a rename of a bunch of hard disk images backed up
across a wan), I did a couple of test files first and did full backup
and verified that I didn't move enough data (or take long enough) to
have copied the images again, so I claim it works.

Good luck. Use at your own risk etc.

I hope a supported mechanism to allow this will be in backuppc version
4 along with some way of easily importing a copy of the data taken by
other means (e.g. in a tarball on a hard drive). As it comes up often
enough with users who have large data sets.

-- 
                                -- rouilj

John Rouillard       System Administrator
Renesys Corporation  603-244-9084 (cell)  603-643-9300 x 111

------------------------------------------------------------------------------
Benefiting from Server Virtualization: Beyond Initial Workload 
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve 
application availability and disaster protection. Learn more about boosting 
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/