All,
And yet another installment from me today. We are running MS Cluster =
Server
on
our node that was the subject of my previous postings. Here's the =
deal...
We have two servers, Krypton, and Vulcan. Both NT Server 4.0. They
function
as a cluster server with our Clarion array. Our virtual server's name =
is
DUNE.
When all is well, Krypton will "own" the R: drive. Vulcan will "own" =
S:,
T:,
and X:, the drives R,S,T,and X are what comprises our cluster server, =
DUNE.
When either Krypton or Vulcan goes down, the drives that are owned by =
that
machine will fail over to the other machine, resulting in minimal =
downtime
for
our highly valued users. Therein lies our problem. Since we =
effectively
have
no idea who will own what drive on any given day, we are running into
scheduling
problems with ADSM. I have included some examples of files for your =
viewing
enjoyment.
Krypton.txt =3D a snip of the dsmsched.log on Krypton. (The one =
pertaining to
the
cluster)
Vulcan.txt =3D a snip of the dsmsched.log on Vulcan. (ditto)
Clusterk.opt =3D the options file on Krypton pertaining to the cluster.
Clusterv.opt =3D the options file on Vulcan pertaining to the cluster.
actlog.txt =3D activity log from our ADSM server during the time in =
question
pertaining to the cluster.
I have 2 scheduler services installed on Krypton and Vulcan. One for =
the
local
machine, and one for DUNE. Each schedule uses it's own options file =
and
schedlog. I can run a manual incremental on either of these machines =
with
no
problem (not including the performance issues).
It is a given that one of these machines, Krypton or Vulcan, will hit =
our
ADSM
server first. In the example I have included, all drives were on =
Krypton
at
the time of the backup. Vulcan hit the server first, reported that the
drives
were invalid, as they were not "owned" by Vulcan, and failed the job. =
6
minutes
later, Krypton comes along and tries to do it's backup. It is told =
rather
rudely that "Either the window has elapsed or the schedule has been =
deleted"
This session is well within the time allowed, and they are two of the =
first
machines to hit the server each night. I have them start at 17:55 as
opposed to
18:00 to ensure this. I also have the maxscheduledsessions set way =
above
what
will ever hit the server at once. It seems that since on of the =
machines
has
hit the server, and completed already, that when the next machine tries =
to
come
in with the same node name, ADSM thinks it has already backed up for =
the
night,
and won't let it do anymore. From what I have read, this is installed
according
to IBM's spec's for working with clusters. Yet, this is not a solid
solution,
as I am not guaranteed a slot for each machine to backup whichever =
drives
they
own.
Does anyone successfully backup clusters? If you do, how can I? ANY =
ideas
or
hints would be helpful. Thanks for listening.
Sincerely,
Sean M. Stecker
stecker.sean AT orbital-lsg DOT com
(See attached file: actlog.txt)(See attached file: clusterk.opt)(See
attached
file: Krypton.txt)(See attached file: clusterv.opt)(See attached file:
Vulcan.txt)
|