Re: [BackupPC-users] Problems with hardlink-based backups...

Peter Walter wrote at about 13:27:38 -0400 on Tuesday, September 1, 2009:
 > Les Mikesell wrote:
 > > Peter Walter wrote:
 > >   
 > >> Jim Leonard wrote:
 > >>     
 > >>> Peter Walter wrote:
 > >>>   
 > >>>       
 > >>>> I have access to "cloud storage" I would like to take 
 > >>>> advantage of, but can't because of the hardlink issue. My (klugey) 
 > >>>> solution at present is to use a backuppc server to backup the backuppc 
 > >>>> server, but even incrementals take days to run.
 > >>>>     
 > >>>>         
 > >>> What is the problem with your cloud storage such that you can't use it 
 > >>> to make a backup of BackupPC?  What cloud storage do you have access to, 
 > >>> and what operating system and filesystem are you using to run BackupPC?
 > >>>   
 > >>>       
 > >> I have not (yet) come across a cloud storage provider who supports 
 > >> hardlinks. The specific provider I was talking about is rsync.net.
 > >> For all the backuppc servers I (currently) administer, the OS is Centos 
 > >> 5.x, and the filesystem is ext3.
 > >>     
 > >
 > > Is there a limit to the file size?  Why not put an image copy of your 
 > > archive 
 > > filesystem in a file?  Or for an interesting variation, make a vmware vmx 
 > > virtual disk split into 1 or 2 gig file segments locally, image copy to 
 > > that, 
 > > then rsync the segments off somewhere else.  If you are lucky, there won't 
 > > be 
 > > changes in all the segments on every run and by splitting it you greatly 
 > > reduce 
 > > the workspace needed by rsync as it constructs a new copy of each file 
 > > before 
 > > deleting the old one.   But can you live with the time it would take to 
 > > copy 
 > > your data back from cloud storage if you ever need it?
 > >
 > >   
 > My objective is to administer a secondary backup server (for 
 > second-level disaster recovery) which is dedicated to backing up primary 
 > backup servers, where the primary backup servers are in seperate 
 > domains, and the backup targets are only accessible by the primary 
 > backup server for the domain. I selected backuppc for the primary backup 
 > server because I like the pooling feature very much - in my environment, 
 > the primary backup servers back up a mixed load of Windows / Linux / OSX 
 > servers and workstations, and I have found that the pooling feature cuts 
 > down a lot of the resources (bandwidth, space) required. In addition, 
 > from my own observations, and from reading the comments on this site, 
 > backuppc is *very* reliable and fairly easy to use. I suspect that I 
 > will find that there exists a lot of redundancy within the files created 
 > by the primary backup servers, and therefore I wished to take further 
 > advantage of the pooling mechanism by using backuppc to backup backuppc 
 > servers. Yes, there are a variety of other techniques I could use, such 
 > as image copies, to back up a backuppc server, and I may end up using 
 > them. What I don't understand is why such a great backup system such as 
 > backuppc cannot reasonably be used to backup itself - it seems to me 
 > that since backuppc "knows" it's own architecture, a way could be found 
 > to do it efficiently. Since my objective is to do second-level disaster 
 > recovery, allowing a day or two to restore a backuppc machine would work 
 > for me - since the original hardware and the targets that were backed up 
 > would have probably been destroyed in the disaster anyway, operations 
 > may need to be moved to another site, etc. I think the locations within 
 > which I have installed backuppc would be willing to wait for five days 
 > for full functionality to be restored - meaning, the primary backup 
 > machine being recreated, and the targets of the primary backup machine 
 > being restored. Since, as I understand it, the hardlink usage in 
 > backuppc is the primary reason why rsync cannot efficiently backup a 
 > backuppc machine, I would be satisfied if the hardlinks were dumped 
 > seperately, and a way to reconstitute them provided, with the pool(s) 
 > being backed up as normal.
 > 
 > If a solution like that is not feasible, then I will have to consider 
 > image copies of one sort or another of the primary backup servers. 
 > However, there probably will be a limit to how many image copies could 
 > be done in a day. With backuppc, as I understand it, the configurable 
 > interval between incrementals and the pooling mechanism would allow me 
 > to more or less continuously backup the primary backup servers.
 > 

I agree with your need but good luck since there is a vociferous group
of contributors here who think the solution is always ZFS or general
block copy. They can't seem to understand that not only is there a
broad base of users who would like a file-level backup approach but
that actually the known structure of the pool and pc trees makes that
possible in O(n log n) time even without making any changes to the
current BackupPC program.

I am happy to help with suggesting and debugging algorithms and with
testing. I am probably not the right one to code such a program since
I have no experience with coding for speed and my results would
(hopefully) work but would likely be more of a brute-force hack.

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/