BackupPC-users

Re: [BackupPC-users] Advice on creating duplicate backup server

2008-12-08 18:36:58
Subject: Re: [BackupPC-users] Advice on creating duplicate backup server
From: dan <dandenson AT gmail DOT com>
To: "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Mon, 8 Dec 2008 16:35:10 -0700
you could mess around with LVM snapshots.  I hear that you can make an LVM snapshot and rsync that over, then restore it to the backup LVM.  I have not tried this but have seen examples around the net.

have you tried rsync3?  it works for me. I dont quite have 3TB so I cant really advise you on that size, Im not sure where the line is on file count that rsync3 cant handle.

ZFS would be ideal for this but you have to make the leap to a solaris/opensolaris kernel.  ZFS Fuse is completely non-functional for backuppc as it will crash as soon as you starting hitting the filesystem and the delayed write caching kicks in.  ZFS on freebsd is not mature enough and tends to crash out with heavy IO.

with zfs it works something like this:
http://blogs.sun.com/clive/resource/zfs_repl.ksh

you can send a full zfs snapshot like
zfs send /pool/fs@snapshotname | ssh remotehost zfs recv -v /remotepool/remotefs
or send an incremental afterwards with
zfs send -i /pool/fs@snapshotname | ssh remotehost zfs recv -F -v /remotepool/remotefs

feel free to compress the ssh stream with -C if you like, but I would first check your bandwidth usage and see if you are using the whole thing.  If not, then the compression will slow you down.

The real downside here is the switch to solaris if you are a linux person.  You can also try nexenta which is the opensolaris kernel on a debian/ubuntu userland complete with apt.

You also get filesystem level compression with ZFS so you dont need to compress your pool.  This should make recovering files outside of backuppc a little more convenient .

how is a tape taking 1-2weeks?  1 week = 5.2KB/s.  If you are that IO constrained, nothing is going to work right for you.  How full is your pool?

you could also consider not keeping a copy of the pool remotely be rather pullting a tar backup off the backuppc system on some schedule and sending that to the remote machine for storage.

The problem with using NBD or anything like that and using 'dd' is that there is no resume support and with 3TB you are likely to get errors every now and then.  even with a full T1 you are stuck at at least 6hour with theoritical numbers and are probably looking at %50 more than that.

as far as some other scheme to syncing up the pools, hardlinks will get you. 

You could use find to traverse the entire pool and take some info down on each file such as name, size, type etc etc and then use some fancy perl to sort this out into managable groups and then use rsync on individual files.


On Mon, Dec 8, 2008 at 7:37 AM, Jeffrey J. Kosowsky <backuppc AT kosowsky DOT org> wrote:
Stuart Luscombe wrote at about 10:02:04 +0000 on Monday, December 8, 2008:
 > Hi there,
 >
 >
 >
 > I've been struggling with this for a little while now so I thought it about
 > time I got some help!
 >
 >
 >
 > We currently have a server running BackupPC v3.1.0 which has a pool of
 > around 3TB and we've got to a stage where a tape backup of of the pool is
 > taking 1-2 weeks, which isn't effective at all.  The decision was made to
 > buy a server that is an exact duplicate of our current one and have it
 > hosted in another building, as a 2 week old backup isn't ideal in the event
 > of a disaster.
 >
 >
 >
 > I've got the OS (CentOS) installed on the new server and have installed
 > BackupPC v3.1.0, but I'm having problems working out how to sync the pool
 > with the main backup server. I managed to rsync the cpool folder without any
 > real bother, but the pool folder is the problem, if I try an rsync it
 > eventually dies with an 'out of memory' error (the server has 8GB), and a cp
 > -a didn't seem to work either, as the server filled up, assumedly as it's
 > not copying the hard links correctly?
 >
 >
 >
 > So my query here really is am I going the right way about this? If not,
 > what's the best method to take so that say once a day the duplicate server
 > gets updated.
 >
 >
 >
 > Many Thanks

It just hit me that given the known architecture of the pool and cpool
directories shouldn't it be possible to come up with a scheme that
works better than either rsync (which can choke on too many hard
links) and 'dd' (which has no notion of incremental and requires you
to resize the filesystem etc.).

My thought is as follows:
1. First, recurse through the pc directory to create a list of
  files/paths and the corresponding pool links.
  Note that finding the pool links can be done in one of several
  ways:
  - Method 1: Create a sorted list of pool files (which should be
    significantly shorter than the list of all files due to the
        nature of pooling and therefore require less memory than rsyn)
        and then look up the links.
  - Method 2: Calculate the md5sum file path of the file to determine
        out where it is in the pool. Where necessary, determine among
        chain duplicates
  - Method 3: Not possible yet but would be possible if the md5sum
    file paths were appended to compressed backups. This would add very
        little to the storage but it would allow you to very easily
        determine the right link. If so then you could just read the link
        path from the file.

 Files with only 1 link (i.e. no hard links) would be tagged for
 straight copying.

2. Then rsync *just* the pool -- this should be no problem since by
  definition there are no hard links within the pool itself

3. Finally, run through the list generated in #1 to create the new pc
  directory by creating the necessary links (and for files with no
  hard links, just copy/rsync them)

The above could also be easily adapted to allow for "incremental" syncing.
Specifically, in #1, you would use rsync to just generate a list of
*changed* files in the pc directory. In #2, you would continue to use
rsync to just sync *changed* pool entries. In #3 you would only act on
the shortened incremental sync list generated in #1.

The more I think about it, the more I LIKE the idea of appending the
md5sums file paths to compressed pool files (Method #3) since this
would make the above very fast. (Note if I were implementing this, I
would also include the chain number in cases where there are multiple
files with the same md5sum path and of course then BackupPC_nightly
would have to adjust this any time it changed around the chain
numbering).

Even without the above, Method #1 would still be much less memory
intensive than rsync and Method #2 while potentially a little slow
would require very little memory and wouldn't be nearly that bad if
you are doing incremental backups.

------------------------------------------------------------------
Just as any FYI, if anyone wants to implement method #2, here is the
routine I use to generate the md5sum file path from a (compressed)
file (note that it is based on the analogous uncompressed version in
Lib.pm).

use BackupPC::Lib;
use BackupPC::Attrib;
use BackupPC::FileZIO;

use constant _128KB               => 131072;
use constant _1MB                 => 1048576;

# Compute the MD5 digest of a compressed file. This is the compressed
# file version of the Lib.pm function File2MD5.
# For efficiency we don't use the whole file for big files
#   - for files <= 256K we use the file size and the whole file.
#   - for files <= 1M we use the file size, the first 128K and
#     the last 128K.
#   - for files > 1M, we use the file size, the first 128K and
#     the 8th 128K (ie: the 128K up to 1MB).
# See the documentation for a discussion of the tradeoffs in
# how much data we use and how many collisions we get.
#
# Returns the MD5 digest (a hex string).
#
# If $filesize < 0 then always recalculate size of file by fully decompressing
# If $filesize = 0 then first try to read corresponding attrib file
#    (if it exists), if doesn't work then recalculate
# IF $filesize >0 then use that as the size of the file

sub zFile2MD5
{
   my($bpc, $md5, $name, $filesize, $compresslvl) = @_;

       my $fh;
       my $rsize;
       my $totsize;

       $compresslvl = $Conf{CompressLevel} unless defined $compresslvl;
       unless (defined ($fh = BackupPC::FileZIO->open($name, 0, $compresslvl))) {
               printerr "Can't open $name\n";
               return -1;
       }

       my $datafirst = my $datalast = '';
       my @data = "">        #First try to read up to the first 128K (131072 bytes)
       if ( ($totsize = $fh->read(\$datafirst, _128KB)) < 0 ) {
               printerr "Can't read & decompress $name\n";
               return -1;
       }
       elsif ($totsize == _128KB) { # Read up to 1st MB
               my $i=0;
               #Read in up to 1MB (_1MB), 128K at a time and alternate between 2 data buffers
               while ( ($rsize = $fh->read(\$data[(++$i)%2], _128KB) == _128KB)
                       &&  ($totsize += $rsize) < _1MB) {}
               $totsize +=$rsize if $rsize < _128KB; # Add back in partial read
           $datalast = substr($data[($i-1)%2], $rsize, _128KB-$rsize)
                       . substr($data[$i%2], 0 ,$rsize);
   }
   $filesize = $totsize if $totsize < _1MB; #Already know the size because read it all
   if ($filesize == 0) { # Try to find size from attrib file
               $filesize = get_attrib_value($bpc, $name, "size");
               warn "Can't read size of $name from attrib file so calculating manually\n" unless defined $filesize;
       }
   unless ($filesize > 0) { #continue reading to calculate size
               while (($rsize = $fh->read(\($data[0]), _128KB)) > 0) {
                   $totsize +=$rsize
       }
       $filesize = $totsize;
  }
  $fh->close();

       $md5->reset();
   $md5->add($filesize);
   $md5->add($datafirst);
   ($datalast eq '') || $md5->add($datalast);
   return $md5->hexdigest;
}

# Returns value of attrib $key for $fullfilename (full path)
# If attrib file not present or there is not an entry for
# the specificed key for the given file, then return 'undef'
sub get_attrib_value
{
       my ($fullfilename, $key) = @_;
       $fullfilename =~ m{(.+)/f(.+)};  #1=dir; $2=file

       return undef if read_attrib(my $attr, $1) < 0;
       return $attr->{files}{$2}{$key}; #Note this returns undefined if key not present
}

#Reads in the attrib file for directory $_[1] and (optional alternative attrib file name $_[2]) and
#stores it in the hashref $_[0] passed to the function
#Returns -1 and a blank $_{0] hash ref if attrib file doesn't exist already (not necessarily an error)
#Dies if attrib file exists but can't be read in.
sub read_attrib
{ #Note: $_[0] = hash reference to attrib object
       $_[0] = BackupPC::Attrib->new({ compress => $Conf{CompressLevel} });
       return -1 unless -f attrib($_[1], $_[2]);  #This is not necessarily an error because dir may be empty
       die "Error: Cannot read attrib file: " . attrib($_[1],$_[2]) . "\n" unless $_[0]->read($_[1],$_[2]);
       return 1;
}

------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/