ADSM-L

Re: Disaster Recovery

1996-05-06 09:25:42
Subject: Re: Disaster Recovery
From: "Andrew M. Raibeck" <araibeck AT VNET.IBM DOT COM>
Date: Mon, 6 May 1996 06:25:42 PDT
Vin San Angelo asks:

>We are in the process of implementing ADSM and are concerned with the disaster
>recovery features of ADSM especially, making tapes to be sent offsite.  Does
>anyone have experience with this?  If so, I would like to hear about your
>experiences and procedures for doing so.

The following is an item I posted a while back in my previous life as an ADSM
customer. The item is as I wrote it, except for line item addition marked with
an '***'.

If you already haven't done so, I *strongly* recommend reading Chapter 14 in
the Administrator's Guide on Disaster Recovery.

Andy Raibeck
ADSM Level 2 Support
408-256-0130

 ==============================================================================
Most of my disaster recovery stuff is driven by the ADSM scheduling facility.
Here's what a day in the life of ADSM disaster backup looks like at CML:

06:30 - A schedule runs to issue the BACKUP STG command for my disk backup
        storage pool:
        BACKUP STGPOOL BACKUPPOOL DISASTER_RECOVERY MAXPROCESS=6

06:30 - A schedule runs to issue the BACKUP STG command for my disk archive
        storage pool:
        BACKUP STGPOOL ARCHIVEPOOL DISASTER_RECOVERY MAXPROCESS=2

06:30 - A schedule runs to issue the BACKUP STG command for my tape backup
        storage pool:
        BACKUP STGPOOL TAPEPOOL DISASTER_RECOVERY MAXPROCESS=2

09:30 - A schedule runs to issue the UPDATE VOLUME command for my disaster
        recovery copy storage pool. This command changes all the volumes
        created by the 06:30 processes so that their access is 'offsite':
        UPDATE VOLUME * ACCESS=OFFSITE LOCATION='IRON MOUNTAIN' +
           WHERESTGPOOL=DISASTER_RECOVERY WHEREACCESS=READWRITE,READONLY +
           WHERESTATUS=FILLING,FULL

10:00 - A schedule runs to issue the BACKUP DB command:
        BACKUP DB DEVCLASS=VAULT TYPE=INCREMENTAL (Monay - Friday)
        BACKUP DB DEVCLASS=VAULT TYPE=FULL (Saturday)

11:00 - A CA-7 job is triggered to get a pull list for our tape operators
        for all newly-created volumes that I send offsite. These include
        volumes created by the database backups and the storage pool backups.
        The qualifier defined in my VAULT device class is 'DSMVAULT'. My
        other device classes use 'DSM'.

12:00 - A schedule runs to delete old volume history information:
        DELETE VOLHISTORY TODATE=TODAY-35
        (Note: 35 days is a lot, and I will probably eventually lower
        this to something like 10 days.)

15:00 - A schedule runs to issue the BACKUP STG command for my disk backup
        storage pool:
        BACKUP STGPOOL BACKUPPOOL DISASTER_RECOVERY MAXPROCESS=2

*** (new line item): Offsite volumes with a status of EMPTY have been empty
    for the number of days specified by the REUSEDELAY parameter. These
    volumes can be identified with the following command:

       QUERY VOLUME STGPOOL=DISASTER_RECOVERY ACCESS=OFFSITE STATUS=EMPTY

    These volumes should be returned to the onsite location. Once they are
    returned, their status should be updated to either READ or READWRITE
    (it doesn't matter). Once the status has been updated, they will be
    returned to scratch.

One thing I need to add is a job to dump those ADSM data sets that I would
need in a disaster recovery situation to tape: volume history, device class
files, disklog, linklib, message libs, help libs, etc. This would run some-
time between when the database backup ends and the 11:00 pull list job.

When I originally set this up, the tape pool was the hardest to get fully
copied to the DISASTER_RECOVERY pool. That's because I well over 1,500
3480 tapes to read. So I took a phased approach to this.

First I set up the schedules to back up the two disk pools: ARCHIVEPOOL and
DISKPOOL. This incurred scratch mounts only, so it wasn't a problem. The
bulk of my ADSM activity is in backup, so I ran these two schedles for about
3 weeks before attempting to back up the TAPEPOOL storage pool. By doing so,
I was able to copy all *new* backup versions from my disk pool to the
DISASTER_RECOVERY pool, and over that 3 weeks, turn over a large number of
versions that existed in the tape pool. The net effect was that I avoided
backing up a lot data from my tape pool that would have expired in a few
weeks anyway, and thus avoided a ton of input mounts. I'd waited this long
for the disaster recovery features, how much difference would a few extra
weeks make? The upshot is that when I finally started backing up my tape
storage pool, I ended up not needing to mount around 40% of the tapes in
my tape pool (because I already had backups for those versions from when
they were in the disk backup pool).

Due to drive allocation problems, and the time it took for manual tape
mounts (we are not "roboticized"), I had to start and stop the tape pool
backup quite a few times. But over the course of 3 or 4 weeks, I finally
managed to get *all* of my primary storage pools backed up to the copy
storage pool (DISASTER_RECOVERY). The entire process took around 2 months.
Once that was complete, I added the schedule to back up the tape pool on a
daily basis.

I do have workstation backups that run during the day. I then migrate most
of my disk backup pool to tape, later in the day (4PM). However, since these
more recent backups haven't been backed up yet to the DISASTER_RECOVERY
pool (they were created after the 6:30AM storage pool backup processes),
I'd incur input mounts when the TAPEPOOL backup process started at 06:30 the
next day. So I added another schedule to back up my disk backup pool at 3PM
in order to minimize the number of input mounts required the next morning.

It would be nice if I could base some of these schedules on events, rather
than time of day. For example, set ADSM up so that when all three storage
pool backup events complete (the 6:30AM ones), the UPDATE VOLUME command
will then execute, followed by the BACKUP DB command. But for now, time of
day works out. The backup for the disk backup pool takes the longest, around
an hour and a half to two hours or so. The UPDATE VOLUME runs in a minute
or less. So I've got plenty of surplus time built in to the schedules.
<Prev in Thread] Current Thread [Next in Thread>