Hi All,
I have just done our first SP recovery offsite and thought I'd share my
experiences with you. I have mainly 'lurked' in this list and figure it is
'payback' time.
Our site is a three frame SP with 21 nodes and around 1.2 TB of ssa disk. Cw is
a 42t, all nodes running AIX 4.1.4 and PSSP 2.2. In Australia at this time no
drp service provider has enough hardware to allow a recovery of all our nodes,
so we just recovered three nodes (one wide and 2 thins) with around 100GB SSA.
All our nodes have two internal disks , one being rootvg and the other being
altvg. (Yes I know I should mirror rootvg to the second disk, but these nodes
were installed before that was supported on SP).
First off, our current backup strategy uses SYSBACK to back up the Control
Workstation weekly to 2 8mm tapes. I also use sysback to backup the adsm server
(on a wide node). I also do weekly mksysb of all my nodes to the CW every week.
My nodes are all different and its too hard to use the same image for all
nodes. I do daily primary storage pool backups to an offsite copy pool.
Now to the good stuff.
My first worry was that the CW at the hotsite was not the same as ours. We have
a 42t, the hotsite a C10. My fears were unfounded. The SYSback restore of the
CW (rootvg and altvg (where /spdata lives)) restored beautifully. I think that
both are microchannel machines and the hdisk layout was the same helped a lot.
Once the control workstation was restored, and the SDR reconfigured for the new
SP layout, /etc/hosts file modified and other SP Stuff that I wont go into, it
was time to NIM install the wide node running ADSM. I also installed the other
two thin nodes at this stage.
The image restored to the WIDE node ok and I used SYSBACK to recover the ALTVG
that contained the adsm, databases and disk storage pools. This went ok.
SP Parallel switch was re-configured and started Ok.
I restored the Device Config file and the volhistory from a backup diskette
that is created daily after the ADSM DB backup. (I only do full backups - the
db is only 1.2GB in size at the moment). I manually edited the device config
file to match the devices at the DR site.
The next step was to restore the ADSM DB (didnt trust the physical image of the
one from the SYSBACK restore.). This also went smoothly.
ADSM Server was started and the primary storage pools were marked destroyed. I
also deleted the old drives (from home site) and added the drives for the DR
Site. I also had to re-apply the correct license codes for the ADSM Server.
Then some trouble happened. I selected at random a file to restore to the CW
(just a plain old text file) to test the adsm recovery and also to see if it
would use the copy pool tapes. The ADSM server refused to restore the file and
issued the following message: ANR0540W Retrieve or Restore failed for
<filename> . Data integrity error detected.
I started to sweat a little at this stage.
So I selected another file to try - same result.
I started to sweat a little more.
I started to run an auditdb - four hours later I cancelled it as I was running
out of time.
So I thought - nothing to lose - lets restore the SAP/Oracle excutables on one
of the thin nodes. I recreated the filesystems from a script that I keep with
the current filesystem layout defined, started the adsm restore of these file
systems and lo and behold it worked fine.
I started sweating less.
Next Step was to restore the oracle db for SAP. (via backint). This step worked
fine.
I stopped sweating.
One interesting problem I ran into was that symbolic links on the restore of
the SAP/Oracle filesystems didnt work. The Symbolic links pointed to no where.
I had to manually delete these links and re-do the links by hand. If you are
familiar with SAP R/3 and oracle, then this was real fun (not).
I discovered APAR IX70295 that describes this situation. The solution is to put
USELARGebuffers No into the dsm.sys. Wish I knew it when I did the test put I
guess that is what tests are for. (I havent tried using USELARGebuffers yet).
ADSM also didnt restore some directories that were empty. These had to be
manually created. (Discovered this when SAP wouldnt start). Has anyone heard of
this before ??
Anyway, I was able to get SAP started and a SAP Gui going. I quick check of the
system showed all was good. (Yahoo!)
I have contacted IBM re the database integrity error. Interestingly enough I
could restore that file on our home system when I got back. Also, whilst at
the DR site I was able to restore the previous version of that file ok. Maybe
this file was being backed up by the client as at the same time of the DB
Backup ? - I dont know for sure. If so then this is bad. Just my luck to pick a
file one out of 5 million or so that want restore. I should buy a lottery
ticket !!.
Anway the above was just a brief description of what I did. I hope people find
it interesting.
Best Regards,
Peter Zutenis
Principal Systems Programmer
Philip Morris Information Services Ltd
Moorabbin, Australia.
|