Re: [Bacula-users] Bacula and High availability

On 5/7/2014 6:12 AM, Egoitz Aurrekoetxea wrote:
> Good morning,
>
> Have been thinking in how could be setup a bacula infrastructure with HA. You 
> could for example if you use Postgres or Mysql the databases replicate the 
> servers through it’s own replication protocol and will be up to date. For
> backing up pools you could always something like ZFS replication, DRBD or 
> whatever…. but, now… Imagine the following situation :
>
> - Bacula infrastructure A goes down…
> - Bacula infrastructure B is up and replicated from A…. but :
>
> - The database could be after or before the status of the tapes in the pool… 
> (talking about File Storage)
> - The same for the pools and pool’s tapes repesct to the database….
>
> How does bacula manage this situations?. I mean… Is there any possible way of 
> ensuring the replicated content (the combination of both the database and 
> pool’s tapes) is reliable for using it in case of disaster?. How else is this
> advised to be done?.

The database must be HA. I believe even Postgres 9 binary streaming 
replication is not atomic. The local write and replicated write must be 
a single atomic operation. Therefore something like DRBD's kernel-mode 
device driver is required.

DRBD has several write methods that can be used in single primary mode. 
One of those will cause write() service calls to fail on the primary 
unless the write() to the replication storage also succeeds on the 
secondary. Writes are a bit slower because the service call does not 
return until the replication has been made. But this is not a huge 
problem so long as cluster nodes have fast storage and the cluster uses 
a dedicated inter-node network. I use two bonded 1 Gb NICs on each node 
of a two-node Pacemaker/Corosync cluster and connect the two nodes with 
two crossover cables. Each node then has additional NICs for LAN 
connectivity. This prevents a lot of issues because DB writes are 
replicated atomically.

Keep in mind that any job running when the node fails will still fail 
anyway, as the Dir-FD TCP connection will go down. There are ways to 
migrate VMs from one node to another without the TCP connection going 
down, but there is no way that I'm aware of to do this if the primary 
node hardware dies. The Ethernet interface comes back up and has the 
correct MAC, etc., but the TCP session is lost. This is fine for some 
protocols, such as HTTP, but not for Bacula which requires a persistent 
TCP socket. So failed jobs will have to be rerun anyway. Fortunately 
Bacula provides the ability to rerun failed jobs.



------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.
Get unparalleled scalability from the best Selenium testing platform available
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users