BackupPC-users

Re: [BackupPC-users] Advice on creating duplicate backup server

2008-12-08 22:12:19
Subject: Re: [BackupPC-users] Advice on creating duplicate backup server
From: Holger Parplies <wbppc AT parplies DOT de>
To: "Jeffrey J. Kosowsky" <backuppc AT kosowsky DOT org>
Date: Tue, 9 Dec 2008 04:10:17 +0100
Hi,

Jeffrey J. Kosowsky wrote on 2008-12-08 09:37:16 -0500 [Re: [BackupPC-users] 
Advice on creating duplicate backup server]:
> 
> It just hit me that given the known architecture of the pool and cpool
> directories shouldn't it be possible to come up with a scheme that
> works better than either rsync (which can choke on too many hard
> links) and 'dd' (which has no notion of incremental and requires you
> to resize the filesystem etc.).

yes, that hit someone on the list several years ago (I don't remember the
name, sorry). I implemented the idea he sketched (well, more or less, there's
some work left to make it really useful).

> My thought is as follows:
> 1. First, recurse through the pc directory to create a list of
>    files/paths and the corresponding pool links.
>    Note that finding the pool links can be done in one of several
>    ways:
>    - Method 1: Create a sorted list of pool files (which should be
>      significantly shorter than the list of all files due to the
>        nature of pooling and therefore require less memory than rsyn)
>        and then look up the links.

Wrong. You need one entry per inode that points to an arbitrary path (the
first one you copy). Every file(*) is in the pool, meaning a list of all pool
files is exactly what you need. A different way to look at it: every file with
a link count > 1 is a pooled file, and it's these files that cause rsync&co
problems, not single link files. (Well, yes, rsync pre-3 needed a complete
list of all files.)

(*) Files that are not in the pool:
    1.) 0-byte files. They take up no file system blocks, so pooling them
        saves only inodes. Not pooling them makes things simpler.
    2.) log files (they get appended to; that would make pooling somewhat
        difficult; besides, what chance is there of a pool hit?),
        backups files (including backups.old)
    attrib files are pooled, contrary to popular belief, and that makes
    sense, because they are often identical with the same attrib file from
    the previous backup(s).


The algorithm I implemented is somewhat similar:
1.) Walk pool/, cpool/ and pc/, printing information on the files and
    directories to a file (which will be quite large; by default I put it
    on the destination pool FS, because there should be large amounts of
    space there).
2.) Sort the file with the 'sort' command. The lines in the file are
    designed such that they will be sorted into a meaningful order:
    - directories first, so I can create them and subsequently not worry
      about whether the place I want to copy/link a file to already exists
      or not
    - files next, sorted by inode number, with the (c)pool file preceeding its
      pc/ links
      The consequence is that I get all references to one inode on adjacent
      lines. The first time, I copy the file. For the repetitions, I link to
      the first copy. All I need to keep in memory is something like one line
      from the file list, one "previous inode number", one "file name of
      previous inode".
    'sort' handles huge files quite nicely, but it seems to create large
    (amounts of) files under /tmp, possibly under $TMPDIR if you set that (not
    sure). You need to make sure you've got the space, but if you're copying a
    multi-GB/TB pool, you probably have. My guess is that the necessary amount
    of space roughly equals the size of the file I'm sorting.
3.) Walk the sorted file, line by line, creating directories and copying files
    (with File::Copy::cp, but I plan to change that to PoolWrite, so I can add
    (part of) one pool to an existing second pool, or something that
    communicates over TCP/IP, so I can copy to a different machine) and
    linking files (with Perl function link()).
    In theory, a pool could also be compressed or uncompressed on the fly
    (uncompressed for copying to zfs, for instance).


Once again, because people seem to be determined to miss the point: it's *not*
processing by sorted inode numbers in order to save disk seeks that is the
point, it's the fact that the 'link' system call takes two paths

    link $source_path, $dest_path; # to use Perl notation

while the 'stat' system call gives you only an inode number. To link a
filename to a previously copied inode, you need to know the name you copied it
to. A general purpose tool can't know when it will need the information, so it
needs to keep information on all inodes with link count > 1 it has encountered.
You can keep a mapping of inode_number->file_name in memory for a few thousand
files, but not for hundreds of millions. By sorting the list by inode number,
I can be sure that I'll never need the info for one inode again once I've
reached the next inode, so I only have to keep info for one file in memory,
regardless of how many I'm copying. The difficult part is now the 'sort', but,
again, the 'sort' command is good at handling huge files - probably without
limit to the file size.


So, what's the problem? Well, I used it once, because I needed to copy a pool.
It seemed to Work For Me (tm), but I'm unsure how to verify the result, aside
from randomly looking at a few files and hoping the 99.99999% I didn't look at
are ok too. It's far from complete. Its usefulness is limited, as long as I
can't copy to a remote machine. It only handles the pool/, cpool/ and pc/
directories, the rest needs to be copied by hand. There is debug output for
cases I hope to not encounter but which may be present in other people's
pools. I think I'm still missing a chown() or two.

Let's see if I can find the figures. My pool was 103 GB, 10 million directory
entries pointing to 4 million inodes. Copy from local disk to ISCSI target
over shared 100 MBit network in 10:45 hours. Of this time, 15 minutes were
spent examining cpool (pool was empty), 73 minutes examining pc/, 165 seconds
sorting file list (1.1GB) and 9:14 hours copying/linking.
rsync might have worked for this pool, but I didn't test. I would be very
curious how this scales to a 3TB pool though ;-).


Question (to a tar expert named Craig or otherwise):
Is it possible to create a tar stream with this structure (i.e. lots of
directories, then file 1/2/3/123whatever with content, then several links in
different directories under pc/ to this file, then next pool file and so on),
or does a tar need to be sorted by directories?

If it *is* possible, creating a tar stream instead of copying/linking would
not be difficult, and then you could run

    BackupPC_copyPool ... | ssh hostname tar xpf -

(or via netcat, or even store the result on tape). Even merging into an
existing pool could be split into a BackupPC_mergePool script which takes a
tar stream and does whatever is necessary.

>    - Method 2: Calculate the md5sum file path of the file to determine
>        out where it is in the pool. Where necessary, determine among
>        chain duplicates

That's basically what BackupPC_tarPCCopy does.

>    - Method 3: Not possible yet but would be possible if the md5sum
>      file paths were appended to compressed backups. This would add very
>        little to the storage but it would allow you to very easily
>        determine the right link. If so then you could just read the link
>        path from the file. 

I believe this would speed up BackupPC_tarPCCopy by many orders of magnitude.

> 2. Then rsync *just* the pool -- this should be no problem since by
>    definition there are no hard links within the pool itself
> 
> 3. Finally, run through the list generated in #1 to create the new pc
>    directory by creating the necessary links (and for files with no
>    hard links, just copy/rsync them)

See BackupPC_tarPCCopy.

> The above could also be easily adapted to allow for "incremental" syncing.
> Specifically, in #1, you would use rsync to just generate a list of
> *changed* files in the pc directory. In #2, you would continue to use
> rsync to just sync *changed* pool entries. In #3 you would only act on
> the shortened incremental sync list generated in #1.

While you can limit your pc/ directory traversal to only a subset of all
backups of a host (or all hosts, if you give a start date, for example), I
don't quite see how syncing the pool should work. Remember that pool files
with hash collisions may be renumbered. Is this supposed to be limited to
re-creating an identical pool? Even then, renaming a pool file does not affect
the pc/ links to it. Overwriting it with different content does. You would
need to re-establish the correct links for existing backups too, or figure out
how the source pool was changed and replicate the changes to the destination
pool (rm foo_2 foo_3; mv foo_4 foo_2). This can be done, but not with rsync,
as far as I can tell.

> The more I think about it, the more I LIKE the idea of appending the
> md5sums file paths to compressed pool files (Method #3)

Yes, but the question is how often this information is needed. We're going to
a lot of trouble to *eliminate* redundancy. Adding redundancy for a case every
100th user is going to encounter once in his life may not be warranted. Then
again, it's a fixed amount per file and probably not enough to worry about ...

Regards,
Holger

------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/