Bacula-users

Re: [Bacula-users] Era of virtual machines (block level differentials and incrementals)?

2009-06-02 04:56:25
Subject: Re: [Bacula-users] Era of virtual machines (block level differentials and incrementals)?
From: Kevin Keane <subscription AT kkeane DOT com>
Date: Tue, 02 Jun 2009 01:52:40 -0700
Actually, when it comes to VMWare, you DON'T want to do a backup of the 
image files. And you also DON'T want to do a backup of the files in the 
virtual machine. And you also DON'T want to do a block-level backup.

I recently attended a presentation on virtualization, and the presenter 
made a very strong point here that I can confirm from my own experience. 
Backup with virtual machines is a serious problem. VMWare actually has 
an excellent solution for it, but I don't know if bacula supports it.

The problem is oversubscription.

Here's the issue: A physical server is usually idle much of the time. 
Backups usually take up about 2% of the CPU cycles (averaged over the 
day), meaning that the physical server is going to be available to users 
even while the backup is running. Since the server uses very few of its 
CPU cycles, backup is only a minuscule problem in terms of load.

A virtual server does away with much of this idle time. That's the whole 
point of virtualization, after all - to better utilize the hardware. 
Basically, you assume that only one or two of the VMs at a time will 
need 100% CPU utilization. This assumption allows you to cram 10 VMs 
onto a machine that would normally only be able to handle the workload 
of one or two of the VMs. If all 10 of the VMs do need 100% of the CPU 
at the same time, you are in trouble, though - but that's highly 
unlikely to happen, so through the law of averages you are now using 40% 
or 60% of the CPU cycles, instead of maybe 10% or so with a traditional 
physical machine. That's where the savings from virtualization comes 
from. To get that, though, you had to oversubscribe: promise to deliver 
400% of what the CPUs actually can deliver, and hope that nobody ever 
takes you up on that promise. Basically, the same idea that airlines use 
when they oversell seats, or gyms when they sell too many memberships, 
knowing full well that the majority of people won't ever use the gym.

So you WANT oversubscription in a virtual machine.

Now for backup, you have a problem. You no longer have all these 
beautiful idle CPU cycles - and at the same time, you have to back up 21 
times as many machines (assuming you have 10 VMs and do both an image 
backup and regular file backup in the VM). Unfortunately, in this case 
oversubscription bites you big time. Your oversubscribed VM with 10 VMs 
suddenly needs 42% of a whole day's CPU cycles just for backup! Ooops. I 
can confirm this from my own experience. I do have a very beefy VMWare 
Server host with at one point seven or eight virtual servers. A full 
backup of these VMs ran all day, and the machine was slow as a dog 
during that time. Now that I have consolidated servers and only have 
three VMs left (plus another two off site), working with the machine 
during the backup window is more reasonable.

This is a problem with ANY backup program. It's somewhat worse with 
bacula in Windows guests, because you usually also need NTBackup for the 
systemstate.

VMWare has an ingenious solution for this problem. Unfortunately, only 
available in the paid ESX version: mountable snapshots. To do your 
backup of the VM, you simply take a snapshot of your VM, and then you 
mount this snapshot on a separate physical Windows 2003 server across 
the network (I don't think XP, Vista or 2008 work). Instead of using the 
VMWare host's CPU cycles, you are then using the Windows machine's CPU 
cycles to do the backup, and the VMWare host can continue running as if 
nothing had happened.

I don't know if bacula would be able to deal with such a backup 
mechanism, though - especially if the snapshot held, say, a Linux file 
system mounted on Windows 2003. And of course there would be the issue 
with the system state backup, too. Maybe with some creative Run Before 
and Run After scripts, but I have doubts.

Hydro Meteor wrote:
> Hello all --
>
> As the world continues to ramp up into the use of virtual machine 
> systems more and more, its becoming quite an interesting world to live 
> in with regard to storage systems and backups of these virtual machine 
> files. The main virtual machine systems such as those by VMWare (I.e., 
> VMWare Fusion that runs on Mac OS X which is similar if I'm not 
> mistaken to VMWare Workstation) offer useful options such as snapshots 
> and rollbacks.
>
> One of the consequences of having a lot of virtual machine snapshots 
> around on a file system is that its easy for these virtual machine 
> *image* files on the host OS's filesystem to become quite large 
> relatively speaking (it would be easy to have multiple virtual 
> machines for example whose file sizes on the host OS's filesystem are 
> well into the multiple Gigabytes). I have noticed that if one merely 
> boots up a virtual machine, its (relatively large) *image* file will 
> change (even if the actual changes within the virtual machine were 
> scant).
>
> Given this context and Bacula, from a file system standpoint, backing 
> up differentials or incrementals of these large image files on a 
> regular basis could easily start to become problematic, perhaps not so 
> much with respect to Bacula Volumes (whether tape, optical disc, hard 
> drive, etc. because one might argue that storage is cheap and Kryder's 
> Law [1] marches on), but much more so is the issue of network 
> bandwidth (where distributed backups are leveraged, which is one of 
> Bacula's greatest strengths) -- moving gigabyte-scale files can be a 
> problem. Even Amazon, which sells their S3 storage service, has 
> recently offered a beta of their new AWS Import/Export service ("ship 
> us that disk!"):
>
> http://aws.amazon.com/importexport/
>
> http://aws.typepad.com/aws/2009/05/send-us-that-data.html
>
>     *AWS Import/Export: Ship Us That Disk!*
>
>     Since station wagons and tapes are both on the verge of
>     obsolescence, others have updated this nugget of wisdom to
>     reference DVDs and Boeing 747s.
>     Hard drives are getting bigger more rapidly than internet
>     connections are getting faster. It is now relatively easy to
>     create a collection of data so large that it cannot be uploaded to
>     offsite storage (e.g. Amazon S3) in a reasonable amount of time.
>     Media files, corporate backups, data collected from scientific
>     experiments, and potential AWS Public Data Sets are now at this
>     point. Our customers in the scientific space routinely create
>     terabyte data sets from individual experiments.
>
>
> This brings me to a question which is, what about a future version of 
> Bacula that would be able to perform block level backups of 
> differentials and incrementals? That way, if say a 4 GB file 
> (representing a virtual machine for example) had only a small number 
> of disk level blocks that changed, only those blocks would need to be 
> backed up relative to an initial Full backup? I imagine one argument 
> might be to just install Bacula on every virtual machine ever created, 
> but that's not practical. Seeing that Amazon is trying to solve the 
> problem of backups and bandwidth, it strikes me as if Bacula could 
> help to scratch this itch as well?
>
> Cheers,
>
> -hydro
>
> [1] http://en.wikipedia.org/wiki/Mark_Kryder
>
> ------------------------------------------------------------------------
>
> ------------------------------------------------------------------------------
> OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
> looking to deploy the next generation of Solaris that includes the latest 
> innovations from Sun and the OpenSource community. Download a copy and 
> enjoy capabilities such as Networking, Storage and Virtualization. 
> Go to: http://p.sf.net/sfu/opensolaris-get
> ------------------------------------------------------------------------
>
> _______________________________________________
> Bacula-users mailing list
> Bacula-users AT lists.sourceforge DOT net
> https://lists.sourceforge.net/lists/listinfo/bacula-users
>   


-- 
Kevin Keane
Owner
The NetTech
Find the Uncommon: Expert Solutions for a Network You Never Have to Think About

Office: 866-642-7116
http://www.4nettech.com

This e-mail and attachments, if any, may contain confidential and/or 
proprietary information. Please be advised that the unauthorized use or 
disclosure of the information is strictly prohibited. The information herein is 
intended only for use by the intended recipient(s) named above. If you have 
received this transmission in error, please notify the sender immediately and 
permanently delete the e-mail and any copies, printouts or attachments thereof.


------------------------------------------------------------------------------
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users