Re: Offsite Copies for Disaster Recovery

Martha Cooper Golden asked for information on Disaster Recovery processes, so
I figured I'd post this again. Hope it helps...

 ==============================================================================
 The following is an item I posted a while back in my previous life as an ADSM
 customer. The item is as I wrote it, except for line item addition marked with
 an '***'.

 If you already haven't done so, I *strongly* recommend reading Chapter 14 in
 the Administrator's Guide on Disaster Recovery.

 Andy Raibeck
 ADSM Level 2 Support
 408-256-0130

 ===============================================================================
 Most of my disaster recovery stuff is driven by the ADSM scheduling facility.
 Here's what a day in the life of ADSM disaster backup looks like at CML:

 06:30 - A schedule runs to issue the BACKUP STG command for my disk backup
         storage pool:
         BACKUP STGPOOL BACKUPPOOL DISASTER_RECOVERY MAXPROCESS=6

 06:30 - A schedule runs to issue the BACKUP STG command for my disk archive
         storage pool:
         BACKUP STGPOOL ARCHIVEPOOL DISASTER_RECOVERY MAXPROCESS=2

 06:30 - A schedule runs to issue the BACKUP STG command for my tape backup
         storage pool:
         BACKUP STGPOOL TAPEPOOL DISASTER_RECOVERY MAXPROCESS=2

 09:30 - A schedule runs to issue the UPDATE VOLUME command for my disaster
         recovery copy storage pool. This command changes all the volumes
         created by the 06:30 processes so that their access is 'offsite':
         UPDATE VOLUME * ACCESS=OFFSITE LOCATION='IRON MOUNTAIN' +
            WHERESTGPOOL=DISASTER_RECOVERY WHEREACCESS=READWRITE,READONLY +
            WHERESTATUS=FILLING,FULL

 10:00 - A schedule runs to issue the BACKUP DB command:
         BACKUP DB DEVCLASS=VAULT TYPE=INCREMENTAL (Monay - Friday)
         BACKUP DB DEVCLASS=VAULT TYPE=FULL (Saturday)

 11:00 - A CA-7 job is triggered to get a pull list for our tape operators
         for all newly-created volumes that I send offsite. These include
         volumes created by the database backups and the storage pool backups.
         The qualifier defined in my VAULT device class is 'DSMVAULT'. My
         other device classes use 'DSM'.

 12:00 - A schedule runs to delete old volume history information:
         DELETE VOLHISTORY TODATE=TODAY-35
         (Note: 35 days is a lot, and I will probably eventually lower
         this to something like 10 days.)

 15:00 - A schedule runs to issue the BACKUP STG command for my disk backup
         storage pool:
         BACKUP STGPOOL BACKUPPOOL DISASTER_RECOVERY MAXPROCESS=2

 *** (new line item): Offsite volumes with a status of EMPTY have been empty
     for the number of days specified by the REUSEDELAY parameter. These
     volumes can be identified with the following command:

        QUERY VOLUME STGPOOL=DISASTER_RECOVERY ACCESS=OFFSITE STATUS=EMPTY

     These volumes should be returned to the onsite location. Once they are
     returned, their status should be updated to either READ or READWRITE
     (it doesn't matter). Once the status has been updated, they will be
     returned to scratch.

 One thing I need to add is a job to dump those ADSM data sets that I would
 need in a disaster recovery situation to tape: volume history, device class
 files, disklog, linklib, message libs, help libs, etc. This would run some-
 time between when the database backup ends and the 11:00 pull list job.

 When I originally set this up, the tape pool was the hardest to get fully
 copied to the DISASTER_RECOVERY pool. That's because I well over 1,500
 3480 tapes to read. So I took a phased approach to this.

 First I set up the schedules to back up the two disk pools: ARCHIVEPOOL and
 DISKPOOL. This incurred scratch mounts only, so it wasn't a problem. The
 bulk of my ADSM activity is in backup, so I ran these two schedles for about
 3 weeks before attempting to back up the TAPEPOOL storage pool. By doing so,
 I was able to copy all *new* backup versions from my disk pool to the
 DISASTER_RECOVERY pool, and over that 3 weeks, turn over a large number of
 versions that existed in the tape pool. The net effect was that I avoided
 backing up a lot data from my tape pool that would have expired in a few
 weeks anyway, and thus avoided a ton of input mounts. I'd waited this long
 for the disaster recovery features, how much difference would a few extra
 weeks make? The upshot is that when I finally started backing up my tape
 storage pool, I ended up not needing to mount around 40% of the tapes in
 my tape pool (because I already had backups for those versions from when
 they were in the disk backup pool).

 Due to drive allocation problems, and the time it took for manual tape
 mounts (we are not "roboticized"), I had to start and stop the tape pool
 backup quite a few times. But over the course of 3 or 4 weeks, I finally
 managed to get *all* of my primary storage pools backed up to the copy
 storage pool (DISASTER_RECOVERY). The entire process took around 2 months.
 Once that was complete, I added the schedule to back up the tape pool on a
 daily basis.

 I do have workstation backups that run during the day. I then migrate most
 of my disk backup pool to tape, later in the day (4PM). However, since these
 more recent backups haven't been backed up yet to the DISASTER_RECOVERY
 pool (they were created after the 6:30AM storage pool backup processes),
 I'd incur input mounts when the TAPEPOOL backup process started at 06:30 the
 next day. So I added another schedule to back up my disk backup pool at 3PM
 in order to minimize the number of input mounts required the next morning.

 It would be nice if I could base some of these schedules on events, rather
 than time of day. For example, set ADSM up so that when all three storage
 pool backup events complete (the 6:30AM ones), the UPDATE VOLUME command
 will then execute, followed by the BACKUP DB command. But for now, time of
 day works out. The backup for the disk backup pool takes the longest, around
 an hour and a half to two hours or so. The UPDATE VOLUME runs in a minute
 or less. So I've got plenty of surplus time built in to the schedules.