Bacula-users

Re: [Bacula-users] Backing up > 100 servers

2010-02-27 13:36:22
Subject: Re: [Bacula-users] Backing up > 100 servers
From: Kevin Keane <subscription AT kkeane DOT com>
To: "bacula-users AT lists.sourceforge DOT net" <bacula-users AT lists.sourceforge DOT net>
Date: Sat, 27 Feb 2010 10:32:21 -0800
> -----Original Message-----
> From: Stan Meier [mailto:stan.meier AT billigmail DOT org]
> Sent: Saturday, February 27, 2010 10:08 AM
> To: bacula-users AT lists.sourceforge DOT net
> Subject: Re: [Bacula-users] Backing up > 100 servers
> 
> While you are right and creating a configuration based on scripts is
> quite easy (and has added benefits, for example that you dan define
> one file pool per server group), we still have to deal with 120 backup
> jobs.
>
> But, since you didn't jump on that part of my question, I presume
> there is no solution to that?

You will always have (at least) 120 jobs, but with a script, you can simply 
autogenerate them in something like a for loop.

> I see several things here which we will have to look at. Please
> correct me if I'm wrong or if I forgot anything:
> 
> 1. Concurrency: We will need to investigate all the different places
> in Bacula where job concurrency, concurrent pool/storage usage and
> connection limits are defined and adjust them to "fit together" as
> well as optimize them to the I/O operations limit of our raid storage.

In order for concurrency to work well, you'd need multiple different storage 
devices. You could define different directories on your disk as different 
storage devices.

With backing up that many machines, you will be looking at quite a few 
optimizations; there are quite a few bottlenecks you would need to address.

One of the pitfalls with concurrency is that if you set up concurrency on the 
same storage device, multiple backups will end up on the same storage volume. 
That may make it later harder to recover the disk space. Also, restores will 
take longer because bacula has to sort through the jumble of blocks.

> 2. Scheduling: Ideally, the copy job would start as soon as all the
> backup jobs have finished. But since the Schedule resource does not
> allow references to Job names, we are pretty much screwed in that
> department and will probably have to resort to a fixed schedule.

Not necessarily. You have at least two options:

- Don't use a copy job, but instead some shell scripting that happens in a Run 
After Job script. Not sure how well this works with tape drives, but I use it 
for an rsync-based setup.

- Schedule the copy jobs for one minute later than the backup jobs, and with a 
lower priority. Bacula will then complete all backup jobs and immediately work 
on the copy jobs. Pitfall: if the backup jobs overrun, the next backup jobs may 
get scheduled before the copy job ever executes.

- Schedule the copy jobs for one minute later than the corresponding backup 
job, and with the same priority. Then bacula will execute the backup job, the 
corresponding copy job, another backup job, another copy job, etc.

> 3. Job selection: You already pointed that one out. Would it be as
> easy as just selecting all uncopied jobs from a given pool? Since all
> volumes in the pool are recycle before backup starts (or while it is
> running), naturally, the only uncopied jobs would be those that were
> written to disk recently.
> 
> So, for now, our backup plan would be something like:
> 
> 1. Start actual backup
> 1.1 Start backup jobs in parallel
> 1.2 (Possibly erase all used volumes in a given pool)
> 1.3 Get data from clients, write to File pools
> 1.4 wait until all backup operations have finished
> 2. Copy all recent jobs to tape
> 3. Backup catalog
> 
> I don't see how we can synchronize more than 120 backup jobs yet, to
> be honest. We could run backup at 9pm and the copy job at 10am to
> allow for a large margin of error, but that just doesn't feel like a
> proper solution.

Also doesn't look like a large margin of error. Depending on how big your 
backup jobs are and how well tuned your system is, 120 full backups may well 
take much more than 24 hours - and that's simply due to network bandwidth 
limitations and the like, nothing bacula could address.


------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users