Bacula-users

Re: [Bacula-users] Bacula and Xen

2008-05-14 15:45:05
Subject: Re: [Bacula-users] Bacula and Xen
From: Arno Lehmann <al AT its-lehmann DOT de>
To: bacula-users <bacula-users AT lists.sourceforge DOT net>
Date: Wed, 14 May 2008 21:44:25 +0200
Hi,

25.04.2008 20:34, Josh Fisher wrote:
> Hi,
> 
> Thought I would let the list know that I have succeeded in getting all 
> Bacula daemons running in a Xen domU, including SD using USB removable 
> drives as backup storage. The goal was to run Bacula SD and Dir in a Xen 
> domU on a 2-node high-availability cluster. The trick was udev rules on 
> the dom0's of each node to use xm to dynamically attach and detach a USB 
> drive to the domU bacula-sd is running in. Automount is used on the domU 
> to automount the USB drive partition at a static mountpoint based on 
> filesystem label.
> 
> Bottom line is that Bacula now runs failsafe, (sort of). If the node the 
> bacula domU is running on fails, Heartbeat will automatically bring the 
> domU (and Bacula with it) up on the good node. Bacula will only be down 
> a brief time while Heartbeat detects the failure and brings the domU up 
> on the other node. I haven't yet tested failover while a backup job is 
> running. (Just haven't gotten that far yet.)

You will probably find that the jobs running fail.

> I can elaborate if anyone is interested.

Sounds like something for the wiki pages...

> I also have some questions. 
> Does Bacula have a means to re-schedule a job that fails? If Bacula 
> could reschedule the failed job to run again in a few minutes, then this 
> will be a way to make Bacula highly available.

That's easily possible - "reschedule on error" in the job definition. 
Although I'm not sure this will help you, as the newly started DIR 
will not know the job failed.

> Also, what does the 
> client do when the connection to the SD goes down and then comes back up 
> a short time later?

It typically fails, unless it's really the same connection, which I 
doubt. I don't know how TCP failover works in detail, but at least the 
newly started Bacula daemons will not know about the current state of 
the jobs from the former Bacula instance.

Anyway, a number of failed backup jobs is usually not a serious issue.

If you want to handle this automatically, I'd suggest to do the following:
- First, ensure the catalog the newly started Bacula instance sees is 
up to date, i.e. has the jobs running in it - in status "Running", of 
course.

Then, after the failover instabce is up and running, wait a few 
minutes (so everything settles, the catalog database notices the old 
connection is gone, and whatever else might happen, and then query the 
catalog for all jobs that are in state running and were started during 
the last few hours (probably the longest time you expect any job to 
run... I'm a bit unsure how to filter really old catalog data and not 
lose important jobs).

Then take the names and levels of these jobs and feed them in a run 
command to bconsole.

Should be a shell or perl script of less than 100 lines, I believe...

Arno

> 
> --- Josh Fisher

-- 
Arno Lehmann
IT-Service Lehmann
www.its-lehmann.de

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>
  • Re: [Bacula-users] Bacula and Xen, Arno Lehmann <=