ADSM-L

Re: versioning / expiring / multiple backups under same nodename

2002-01-29 12:58:28
Subject: Re: versioning / expiring / multiple backups under same nodename
From: Zlatko Krastev/ACIT <acit AT ATTGLOBAL DOT NET>
Date: Tue, 29 Jan 2002 19:57:15 +0200
Quick and dirty answer - the risk is *between 0 and 100* and cannot be
evaluated *exactly* ! If there is no failover during the month the risk is
certainly 0. In case of failover or failback during the backup window the
risk is ~= 100. And can either you or your customer forget for while about
backups and just explain which resource is where at specific time after
several failovers and failbacks? The configuration your described is
somewhat messy and the main question goes in cluster configuration
direction and aside from TSM/backup.
Lets assume we have a good cluster configuration (other people here called
it "supported"). Lets catch the bull horns and go direct to five systems
with machines M1-M5, local resources LR1-LR5 and cluster resources CR1-CR4
(or even CR5, it does not change the things):
create dsm.sys for M1-M4 containing server/node stanzas for nodes
LR1+CR1, ... , LR4 + CR4
create dsm.sys for M5 containing stanzas for LR5+CR1+CR2+CR3+CR4
create dsmLR<x>.opt for each M1-M5 in a local (non-shared) resource and
use it to start OS&cluster binaries
create dsmCR<n>.opt for each CR1-CR4 in a shared filesystem accessible
after failover/failback
configure on OS start the startup of LR<x> TSM client scheduler using
dsmLR<x>.opt
configure on primary nodes boot startup of cluster TSM client schedule
using CR<n>.opt
ensure failover/failback scripts stop the cluster resource TSM client
scheduler on the failing server and consecutive startup of this scheduler
on the new active node (in your case it would be always M<n> - M5 or M5 -
M<n>). Trigger new incremental backup immeditely after failover/failback to
secure interrupted backup during failover.
ensure correct domain statements in each stanza
Using something similar the risk can be estimated - it is close to the risk
of backing up an ordinary server. The increased server availability due to
clustering does not lower backup risk, it lowers downtime risk. And this
might be used not only for all-fail-over-one but even for any-fail-anywhere
cluster.
So do it by the book and you will go the paved road, do it as you wish and
enjoy the jungle. If nobody used this before none could evaluated it.

Zlatko Krastev
IT Consultant





"Warren, Matthew James" <matthewjames.warren AT EDS DOT COM> on 21.01.2002
14:22:43
Please respond to "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>
To:     ADSM-L AT VM.MARIST DOT EDU
cc:

Subject:        Re: versioning / expiring / multiple backups under same
nodename

We know the correct way to be backing up the cluster. The customer did not
implement it this way, but we are recommending them to do so.

Before they switch from their current set-up to how we are recommending
they
backup the cluster (With 2 TSM environments on each machine, one for local
disk, the other for shared disk) they would like to know exactly what risk
they are runing with their current setup.


Matt.