Re: [Bacula-users] Bacula and High availability
2014-05-12 09:31:20
On 5/7/2014 6:12 AM, Egoitz Aurrekoetxea wrote:
> Good morning,
>
> Have been thinking in how could be setup a bacula infrastructure with HA. You
> could for example if you use Postgres or Mysql the databases replicate the
> servers through it’s own replication protocol and will be up to date. For
> backing up pools you could always something like ZFS replication, DRBD or
> whatever…. but, now… Imagine the following situation :
>
> - Bacula infrastructure A goes down…
> - Bacula infrastructure B is up and replicated from A…. but :
>
> - The database could be after or before the status of the tapes in the pool…
> (talking about File Storage)
> - The same for the pools and pool’s tapes repesct to the database….
>
> How does bacula manage this situations?. I mean… Is there any possible way of
> ensuring the replicated content (the combination of both the database and
> pool’s tapes) is reliable for using it in case of disaster?. How else is this
> advised to be done?.
The database must be HA. I believe even Postgres 9 binary streaming
replication is not atomic. The local write and replicated write must be
a single atomic operation. Therefore something like DRBD's kernel-mode
device driver is required.
DRBD has several write methods that can be used in single primary mode.
One of those will cause write() service calls to fail on the primary
unless the write() to the replication storage also succeeds on the
secondary. Writes are a bit slower because the service call does not
return until the replication has been made. But this is not a huge
problem so long as cluster nodes have fast storage and the cluster uses
a dedicated inter-node network. I use two bonded 1 Gb NICs on each node
of a two-node Pacemaker/Corosync cluster and connect the two nodes with
two crossover cables. Each node then has additional NICs for LAN
connectivity. This prevents a lot of issues because DB writes are
replicated atomically.
Keep in mind that any job running when the node fails will still fail
anyway, as the Dir-FD TCP connection will go down. There are ways to
migrate VMs from one node to another without the TCP connection going
down, but there is no way that I'm aware of to do this if the primary
node hardware dies. The Ethernet interface comes back up and has the
correct MAC, etc., but the TCP session is lost. This is fine for some
protocols, such as HTTP, but not for Bacula which requires a persistent
TCP socket. So failed jobs will have to be rerun anyway. Fortunately
Bacula provides the ability to rerun failed jobs.
------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.
Get unparalleled scalability from the best Selenium testing platform available
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
|
|
|