BackupPC-users

Re: [BackupPC-users] Backup of VM images

2011-06-07 07:00:00
Subject: Re: [BackupPC-users] Backup of VM images
From: Holger Parplies <wbppc AT parplies DOT de>
To: Boniforti Flavio <flavio AT piramide DOT ch>
Date: Tue, 7 Jun 2011 11:54:26 +0200
Hi,

Boniforti Flavio wrote on 2011-06-07 11:00:24 +0200 [Re: [BackupPC-users] 
Backup of VM images]:
> [...]
> So I'm right when thinking that rsync *does* transfer only the bits of a
> file (no matter how big) which have changed, and *not* the whole file?

usually that's correct. Presuming rsync *can* determine which parts have
changed, and presuming these parts *can* be efficiently transferred. For
example, changing every second byte in a file obviously *won't* lead to a
reduction of transfer bandwidth by 50%. So it really depends on *how* your
files change.

> [...]
> Well, size is a critical parameter, because I can suppose that VM images
> are quite *big* files.
> But if the data transfer could be reduced by using rsync (over ssh of
> course), there's no problem because the initial transfer would be done
> by "importing" the VM images from a USB HDD. Therefore, only subsequent
> "backups" (rsyncs) would transfer data.
> 
> What do you think?

First of all, you keep saying "VM images", but you don't mention from which VM
product. Nobody says VM images are simple file based images of what the virtual
disk looks like. They're some opaque structure optimized for whatever the
individual VM product wants to handle efficiently (which is probably *not*
rsyncability). Black boxes, so to say. There are probably people on this list
who can tell you from experience how VMware virtual disks behave (or VirtualBox
or whatever), and it might even be very likely that they all behave in similar
ways (such as changing roughly the same amount of the virtual disk file for the
same amount of changes within the virtual machine), but there's really no
guarantee for that. You should try it out and see what happens in your case.

Secondly, you say that the images are already somewhere, and your
responsibility is simply to back them up. Hopefully, your client didn't have
the smart idea to also encrypt the images and simply forget to tell you.
Encryption would pretty much guarantee 0% rsync savings.

Thirdly, as long as things work as they are supposed to, you are probably
fine. But what if something malfunctions and, say, your client mistakenly
drops an empty (0 byte) file for an image one day (some partition may have
been full and an automated script didn't notice)? The backup of the 0-byte
file will be quite efficient, but I don't want to think about the next
backup. That may only be a problem if the 0-byte file actually lands in a
backup that is used as a reference backup, but it's an example meant to
illustrate that you *could* end up transferring the whole data set, and you
probably won't notice until it congests your links. Nothing will ever
malfunction? Ok, a virtual host is probably perfectly capable of actually
*changing* the complete virtual disk contents if directed to (system update,
encrypting the virtual host's file systems, file system defragmentation
utility, malicious clobbering of data by an intruder ...). rsync bandwidth
savings are a fine thing. Relying on them when you have no control over the
data you are transferring may not be wise, though.
And within BackupPC may not be the best place to handle problems. For
instance, if you first made a local copy of the images and then backed up
that *copy*, you could script just about any checks you want to, use bandwidth
limiting, abort transfers of single images that take too long, use a
specialized tool that handles your VM images more efficiently than rsync,
split your images after transferring ... it really depends on what guarantees
you are making, what constraints you want (or need) to apply, how much effort
you want to invest (and probably other things I've forgotten).

Hope that helps.

Regards,
Holger

------------------------------------------------------------------------------
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/