ADSM-L

Re: Our first drp recovery of an SP system.

1997-08-21 08:33:46
Subject: Re: Our first drp recovery of an SP system.
From: "Kauffman, Tom" <KauffmanT AT NIBCO DOT COM>
Date: Thu, 21 Aug 1997 07:33:46 -0500
Peter -

Thanks for the post! I'm glad to see that I'm on the right track with my
backup strategies - I won't get a chance to test them out myself for
nearly a year.

Tom Kauffman
Sr. Technical Analyst
NIBCO, Inc.

>----------
>From:  Peter Zutenis[SMTP:pzutenis AT IBM DOT NET]
>Sent:  Wednesday, August 20, 1997 8:57 PM
>To:    ADSM-L AT VM.MARIST DOT EDU
>Subject:       Our first drp recovery of an SP system.
>
>Hi All,
>
>I have just done our first SP recovery offsite and thought I'd share my
>experiences with you. I have mainly 'lurked' in this list and figure it is
>'payback' time.
>
>Our site is a three frame SP with 21 nodes and around 1.2 TB of ssa disk. Cw
>is a 42t, all nodes running AIX 4.1.4 and PSSP 2.2. In Australia at this time
>no drp service provider has enough hardware to allow a recovery of all our
>nodes, so we just recovered three nodes (one wide and 2 thins) with around
>100GB SSA.
>All our nodes have two internal disks , one being rootvg and the other being
>altvg. (Yes I know I should mirror rootvg to the second disk, but these nodes
>were installed before that was supported on SP).
>
>First off, our current backup strategy uses SYSBACK to back up the Control
>Workstation weekly to 2 8mm tapes. I also use sysback to backup the adsm
>server (on a wide node). I also do weekly mksysb of all my nodes to the CW
>every week. My nodes are all different and its too hard to use the same image
>for all nodes. I do daily primary storage pool backups to an offsite copy
>pool.
>
>Now to the good stuff.
>
>My first worry was that the CW at the hotsite was not the same as ours. We
>have a 42t, the hotsite a C10. My fears were unfounded. The SYSback restore
>of the CW (rootvg and altvg (where /spdata lives)) restored beautifully. I
>think that both are microchannel machines and the hdisk layout was the same
>helped a lot.
>
>Once the control workstation was restored, and the SDR reconfigured for the
>new SP layout, /etc/hosts file modified and other SP Stuff that I wont go
>into,  it was time to NIM install the wide node running ADSM. I also
>installed the other two thin nodes at this stage.
>
>The image restored to the WIDE node ok and I used SYSBACK to recover the
>ALTVG that contained the adsm, databases and disk storage pools. This went
>ok.
>
>SP Parallel switch was re-configured and started Ok.
>
>I restored the Device Config file and the volhistory from a backup diskette
>that is created daily after the ADSM DB backup. (I only do full backups - the
>db is only 1.2GB in size at the moment). I manually edited the device config
>file to match the devices at the DR site.
>
>The next step was to restore the ADSM DB (didnt trust the physical image of
>the one from the SYSBACK restore.). This also went smoothly.
>
>ADSM Server was started and the primary storage pools were marked destroyed.
>I also deleted the old drives (from home site) and added the drives for the
>DR Site. I also had to re-apply the correct license codes for the ADSM
>Server.
>
>Then some trouble happened. I selected at random a file to restore to the CW
>(just a plain old text file) to test the adsm recovery and also to see if it
>would use the copy pool tapes. The ADSM server refused to restore the file
>and issued the following message: ANR0540W Retrieve or Restore failed for
><filename> . Data integrity error detected.
>
>I started to sweat a little at this stage.
>
>So I selected another file to try - same result.
>
>I started to sweat a little more.
>
>I started to run an auditdb - four hours later I cancelled it as I was
>running out of time.
>
>So I thought - nothing to lose - lets restore the SAP/Oracle excutables on
>one of the thin nodes. I recreated the filesystems from a script that I keep
>with the current filesystem layout defined, started the adsm restore of these
>file systems and lo and behold it worked fine.
>
>I started sweating less.
>
>Next Step was to restore the oracle db for SAP. (via backint). This step
>worked fine.
>
>I stopped sweating.
>
>One interesting problem I ran into was that symbolic links on the restore of
>the SAP/Oracle filesystems didnt work. The Symbolic links pointed to no
>where. I had to manually delete these links and re-do the links by hand. If
>you are familiar with SAP R/3 and oracle, then this was real fun (not).
>
>I discovered APAR IX70295 that describes this situation. The solution is to
>put USELARGebuffers No into the dsm.sys. Wish I knew it when I did the test
>put I guess that is what tests are for. (I havent tried using USELARGebuffers
>yet).
>
>ADSM also didnt restore some directories that were empty. These had to be
>manually created. (Discovered this when SAP wouldnt start). Has anyone heard
>of this before ??
>
>Anyway, I was able to get SAP started and a SAP Gui going. I quick check of
>the system showed all was good. (Yahoo!)
>
>I have contacted IBM re the database integrity error. Interestingly enough I
>could restore that file on our home system when I got back.  Also, whilst at
>the DR site I was able to restore the previous version of that file ok. Maybe
>this file was being backed up by the client as at the same time of the DB
>Backup ? - I dont know for sure. If so then this is bad. Just my luck to pick
>a file one out of 5 million or so that want restore. I should buy a lottery
>ticket !!.
>
>Anway the above was just a brief description of what I did. I hope people
>find it interesting.
>
>
>Best Regards,
>
>Peter Zutenis
>Principal Systems Programmer
>Philip Morris Information Services Ltd
>Moorabbin, Australia.
>
<Prev in Thread] Current Thread [Next in Thread>