BackupPC-users

Re: [BackupPC-users] Correct rsync parameters for doing incremental transfers of large image-files

2012-05-12 06:58:41
Subject: Re: [BackupPC-users] Correct rsync parameters for doing incremental transfers of large image-files
From: Andreas Piening <andreas.piening AT gmail DOT com>
To: "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Sat, 12 May 2012 12:57:07 +0200
Hi Les,

I allready thought about that and I agree that the handling of large image files is problematic in general. I need to make images for the windows-based virtual machines to get them back running when a disaster happens. If I go away from backuppc for transfering these images, I don't see any benefits (maybe because I just don't know of a image solution that solves my problems better).
As I already use backuppc to do backups of the data partitions (all linux based) I don't want my backups to become more complex than necessary.
I can live with the amount of harddisk space the compressed images will consume and the IO while merging the files is acceptable for me, too.
I can tell the imaging software (partimage) to cut the image into 2 GB volumes, but I doubt that this enables effective pooling, since the system volume I make the image from has temporary files, profiles, databases and so on stored. If every image file has changes (even if there are only a few megs altered), I expect the rsync algorithm to be less effective than comparing large files where it is more likely to have a "unchanged" long part which is not interrupted by artificial file size boundaries resulting from the 2 GB volume splitting.

I hope I made my situation clear.
If anyone has experiences in large image file handling which I may benefit from, please let be know!

Thank you very much,

Andreas Piening

Am 12.05.2012 um 06:04 schrieb Les Mikesell:

On Fri, May 11, 2012 at 4:01 PM, Andreas Piening
<andreas.piening AT gmail DOT com> wrote:
Hello Backuppc-users,

I stuck while trying to identify the suitable rsync parameters to handle large image file backups with backuppc.

Following scenario: I use partimage to do LVM-snapshot based full images of my virtual (windows-) machines (KVM) blockdevices. I want to save theses images from the virtualization server to my backup machine running backuppc. The images are between 40 and 60 Gigs uncompressed each. The time-window for the backup needs to stay outside the working hours and is not large enough to transfer the images over the line every night. I red about rsync's capability to only transfer the changed parts in the file by a clever checksum-algorithm to minimize the network traffic. That's what I want.

I tested it by creating a initial backup of one image, created a new one with only a few megs of changed data and triggered a new backup process. But I noticed that the whole file was re-transfered. I waited till the end to get sure about that and decided that it was not the ultimate idea to check this with a compressed 18 GB image file but this was my real woking data image and I expected it to work like expected. Searching for reasons for the complete re-transmission I ended in a discussion-thread where they talked about rsync backups of compressed large files. The explanation made sense to me: The compression algorithm can cause a complete different archive file even if just some megs of data at the beginning of the file hast been altered, because of recursion and back-references.
So I decided to store my image uncompressed which is about 46 Gigs now. I found out that I need to add the "-C" parameter to rsync, since data compression is not enabled per default. Anyway: the whole file was re-created in the second backup run instead of just transfering the changed parts, again.

My backuppc-option "RsyncClientCmd" is set to "$sshPath -C -q -x -l root $host $rsyncPath $argList+" which is backup-pcs default disregarding the "-C".

Honestly, I don't understand the exact reason for this. There are some possibilities that may be guilty:

-> partimage does not create a linear backup image file, even if it is uncompressed
-> there is just another parameter for rsync I missed which enables differential file-changes-transfers
-> rsync exames the file but decides to not use differential updates for this one because of it's size or just because it's created-timestamp is not the same as the prior one

Please give me a hint if you've successfully made differential backups of large image files.

I'm not sure there is a good way to handle very large files in
backuppc.  Even if rysnc identifies and transfers only the changes,
the server is going to copy and merge the unchanged parts from the
previous file which may take just as long anyway, and it will not be
able to pool the copies.    Maybe you could split the target into many
small files before the backup.  Then any chunk that is unchanged
between runs would be skipped quickly and the contents could be
pooled.

-- 
 Les Mikesell
   lesmikesell AT gmail DOT com

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/
<Prev in Thread] Current Thread [Next in Thread>