Bacula-users

Re: [Bacula-users] Backing up > 100 servers

2010-03-01 15:22:20
Subject: Re: [Bacula-users] Backing up > 100 servers
From: Arno Lehmann <al AT its-lehmann DOT de>
To: bacula-users AT lists.sourceforge DOT net
Date: Mon, 01 Mar 2010 21:19:37 +0100
Hello,

27.02.2010 19:08, Stan Meier wrote:
> * Arno Lehmann <al AT its-lehmann DOT de>:
>> 27.02.2010 14:46, Stan Meier wrote:
>>> 1. Keeping configuration sane: With more than 120 servers, we need to
>>> find a way to keep the configuration files readable. Our servers all
>>> follow some naming scheme, for example, we got "appserver01" through
>>> "appserver08" or "webcache01" through "webcache04". We think we should
>>> split client configurations for each server group, so the file
>>> "clientdefs/appserver.conf" would define all appserver0X clients.
>>> Furthermore, most of those servers will need a default job performed
>>> (/etc, /root, /opt and so on). While it's easy to reuse a "JobDefs"
>>> stanza to actually define all those jobs, isn't there any way to
>>> "group" those servers? Do we really have to define more than 120 jobs,
>>> one for each server?
>> The way to go, in my opinion, is to create the actual configuration 
>> dynamically - you can include script output into the configuration 
>> *anywhere*... now use a script that creates the actual configuration 
>> dynamically, starting with a template where you insert the client name.
> 
> While you are right and creating a configuration based on scripts is
> quite easy (and has added benefits, for example that you dan define
> one file pool per server group), we still have to deal with 120 backup
> jobs.
> 
> But, since you didn't jump on that part of my question, I presume
> there is no solution to that?

In my opinion, there's no solution because there's no problem :-)

120 jobs - plus, in the worst case, 120 copy jobs - are no problem to 
Bacula. You'll need to tweak concurrency, priorities, and all the 
(new) directives managing concurrency of several instances of a single 
job a bit, but basically just scheduling all you jobs at the same 
time, with the same priority, and letting them just should work well. 
For the copy jobs, I'd recommend (as others already did) to use a 
different priority. I'd also recommend to use a sqlquery as selection 
scheme.

>>> 2. Backup availability: One plan would be to use a large part of the
>>> 24TB available as a FilePool (or several). Each job would then write
>>> it's data to that pool. A Copy job could copy the data to tape later
>>> on - with the advantage that restores of recent data would be quite
>>> fast since they would still be sitting on disk. Before running the
>>> backup the next day, we would simply recycle those file volumes. Is
>>> that a reasonable strategy?
>> Yes. Properly set up, that's a very reasonable approach. You'll need 
>> to understand retention times and how to select jobs for migration in 
>> detail.
> 
> I see several things here which we will have to look at. Please
> correct me if I'm wrong or if I forgot anything:
> 
> 1. Concurrency: We will need to investigate all the different places
> in Bacula where job concurrency, concurrent pool/storage usage and
> connection limits are defined and adjust them to "fit together" as
> well as optimize them to the I/O operations limit of our raid storage.

Yes. This is probably easier than it looks right now, because you will 
find that having one job per client, and as many jobs per storage 
daemon as possible, will serve you best. So you only have to find out 
what your SD can manage reasonably.

> 2. Scheduling: Ideally, the copy job would start as soon as all the
> backup jobs have finished. But since the Schedule resource does not
> allow references to Job names, we are pretty much screwed in that
> department and will probably have to resort to a fixed schedule.

Schedule, for example, an hour after the backup job and use a 
different priority.

> 3. Job selection: You already pointed that one out. Would it be as
> easy as just selecting all uncopied jobs from a given pool?

As mentioned above, I prefer to use a hand-tailored sql query.

For example, in my office, I'm doing monthly copies of the latest full 
backups to tape. I select all successfully finished full backups that 
were started less than four weeks ago. The regular full backups are 
scheduled to happen during the first week of each month, and the copy 
job is scheduled to run the second week of the month.

Works like a charm, but I would refine things a bit if I had to manage 
more than my 10-20 jobs.

> Since all
> volumes in the pool are recycle before backup starts (or while it is
> running), naturally, the only uncopied jobs would be those that were
> written to disk recently.

Sounds reasonable. Depending on the available disk space for you, you 
might find that keeping as many jobs on disk as possible is more 
convenient, and then you couldn't rely on the above assumption anymore.

> So, for now, our backup plan would be something like:
> 
> 1. Start actual backup
> 1.1 Start backup jobs in parallel
> 1.2 (Possibly erase all used volumes in a given pool)
> 1.3 Get data from clients, write to File pools
> 1.4 wait until all backup operations have finished
> 2. Copy all recent jobs to tape
> 3. Backup catalog
> 
> I don't see how we can synchronize more than 120 backup jobs yet, to
> be honest. We could run backup at 9pm and the copy job at 10am to
> allow for a large margin of error, but that just doesn't feel like a
> proper solution.

Priorities... or, and that's probably even simpler, just start copy 
jobs during all the daytime (reserving nights for backups). If there 
are no jobs to copy, nothing happens. As soon as uncopied jobs show 
up, they will be copied to tape.

However, if you want to start a copy as soon as possible after a job 
is finished, I'd recommend to do this with a run script in each job.
Arno

> Stan
> 
> ------------------------------------------------------------------------------
> Download Intel&#174; Parallel Studio Eval
> Try the new software tools for yourself. Speed compiling, find bugs
> proactively, and fine-tune applications for parallel performance.
> See why Intel Parallel Studio got high marks during beta.
> http://p.sf.net/sfu/intel-sw-dev
> _______________________________________________
> Bacula-users mailing list
> Bacula-users AT lists.sourceforge DOT net
> https://lists.sourceforge.net/lists/listinfo/bacula-users
> 

-- 
Arno Lehmann
IT-Service Lehmann
Sandstr. 6, 49080 Osnabrück
www.its-lehmann.de

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>