ADSM-L

Re: TSM system migration planning

2006-10-04 13:48:36
Subject: Re: TSM system migration planning
From: Ben Bullock <bbullock AT MICRON DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Wed, 4 Oct 2006 11:40:02 -0600
        Larry,
        We just got through a similar move from AIX to AIX hosts,
although with different disks and tape drives. We had to do this process
on 6 TSM servers.

        The actual downtime to the TSM clients was only the time it took
to run an incremental DB backup on the production server and restore it
to the new server. Typically less than 1 hour. 

        This is straight copy of my procedure, so obviously host names
and software locations will be different, but it will show you the flow
of the process. Take it and adjust it for your own installation and
hardware.

Ben


**************************
TSM Server Migration


These steps can be used to migrate the TSM services to a new server.
This process is very much like what would need to be done in a Disaster
recovery situation.

This process was used in the summer of 2006 to migrate the TSM servers
from old RS6000 servers and SSA disks to new P550 servers and EMC
Clariion CX3-80 SANs.


Steps to be done on the new servers before hand.

   1. Build the new hosts with AIX 5.3 according to the documentation.
      NOTE that you can do the NIM installation of the OS over a 10GB
NIC, it works.
      At the time of this documentation, BOTSMTEST1 is the NIM server
used to install AIX 5.3.
      This can be found in this document: Server - AIX5.3 NIM
installation

   2. look in the /etc/inittab file and take out the 'dsmserv' line that
automatically starts the TSM server if it is there.

   3. Configure the /etc/ibmatl.conf and the 3494 tape library itself to
talk to each other.
      The 'mtlib -l /dev/lmcp0 -q L' command will show you if you can
connect to the library.
      You may need to remove the device and reconfigure it back in to
get it to connect.

   4. Move over a standard copy of the dsmserv.opt to the new host.
          * mount botsmtestX:/export/post /mnt
          * cp /mnt/dsmserv.opt /usr/tivoli/tsm/server/bin/dsmserv.opt

   5. Zone the appropriate HBAs to the tape drives and the remaining
HBAs to the Clariion.

   6. 'cfgmgr' the tape drives and configure them as they are on the
current server: Tape drive configuration steps

   7. Create a new volume group with the POWERPATH devices for TSM to
use.
      Make sure you choose to make a Powerpath VG.
      Make the data devices RAW

   8. Create 2 LVs: 1 database LV and 1 log volume that is 12G. Make
them 'raw' devices as it is simpler. There is no need to mirror the
devices as they are protected on the Clariion backend.

   9. Initialize the new LVs. This step will create a "virgin" TSM
server and wipe out the default TSM configuration.
      You will need to remove the existing
"/usr/tivoli/tsm/server/bin/dsmserv.dsk" file before you format these
new devices to TSM.

      format of the command:
          * dsmserv format #-of-log-devices /dev/r??? \u2026
#-of-db-volumes /dev/r????

      Example command:
          * dsmserv format 1 /dev/rtsmlogvol01 1 /dev/rtsmdbvol01 

      See the "TSM Administrator's Reference", near the back of the book
for more information.

  10. Create as many storagepool volumes as you will need by putting
them in the volume group and creating LVs on the LUNs.

  11. Bring up the TSM server in interactive mode to make sure it is
able to come up.
      dsmserv

  12. Re-enter in the tsm license information. You only need to run this
one command.
          * register lic file=/usr/tivoli/tsm/server/bin/tsmee.lic

  13. Define the devclasses needed to restore the DB backup:
      NOTE: You need only run the first command to define the "file"
device type if you will be using an NFS mount to migrate the database.
      NOTE: If you will be using a tape drive, you will not able to
define the path to the tape drive until you take it offline on the
production server, so you may not want to do that until the day of the
migration.

          * DEFINE DEVCLASS FILE DEVTYPE=FILE FORMAT=DRIVE
MAXCAPACITY=66060288K MOUNTLIMIT=1 DIRECTORY=/mnt
          * DEFINE DEVCLASS 3592DEV DEVTYPE=3592 FORMAT=DRIVE
MOUNTLIMIT=DRIVES MOUNTWAIT=60 MOUNTRETENTION=60 PREFIX=ADSM
LIBRARY=BOITAPELIBX
          * DEFINE LIBRARY BOITAPELIBX LIBTYPE=349X PRIVATECATEGORY=88
SCRATCHCATEGORY=89 SHARED=NO
          * DEFINE PATH SERVER1 BOITAPELIBX SRCTYPE=SERVER
DESTTYPE=LIBRARY DEVICE=/dev/lmcp0
          * DEFINE DRIVE BOITAPELIBX 3592DRV10
          * DEFINE PATH SERVER1 3592DRV10 SRCTYPE=SERVER DESTTYPE=DRIVE
LIBRARY=BOITAPELIBX DEVICE=/dev/rmt9.7aeb.0

          * Bring down server.
                o tsm> halt


Steps to be done on the day of the migration.

NOTE - you can move the database over through a tape mount or and NFS
mount, for this procedure we will restore the full from the tape backup
and the incremental from the NFS mount.
If you choose to make a full backup again, the NFS area must be mounted
on /mnt for this process. These are the commands you would use:

    * BACKUP DB dev=FILE ty=full scratch=yes
      - to backup to the NFS mount.
    * BACKUP DB dev=3592dev ty=full scratch=yes
      - to backup to a tape drive.

   1. Find the full backup of the production server database that was
made to tape today.
      - Look at the e-mail sent for the day or do a "q volhist ty=dbb"
command within TSM.

   2. Make note of the volume name.

   3. Restore the full database backup to the new server but do NOT
commit the changes on the restore.
      NOTE:If you are going to use a tape drive, you need to take it
offline on the production server to use it.
      You need to use the same drive name that you configured in step
#13 in the Pre-upgrade process.

      In the command below you will obviously change the 'vol=' option
to the tape that has the TSM DB backup on it.
          * dsmserv restore db devclass=3592dev vol=A00032
            OR for an NFS mount:
          * dsmserv restore db devclass=file vol=/mnt/52119599.dbs

   4. Your milage may vary, but our test restores, it ran at about
100GB/hour.

   5. While the restore is working on the new server, migrate all the
data on disk to the tapes with command similar to these:
      NOTE: try to keep all the tape drives working to speed data
getting to tape.
          * The 'q stg *disk*' command will show you which storage pools
to drain to tape.
          * update stg I_DISKPOOL hi=0 low=0 migproc=2
          * update stg DB_DISKPOOL hi=0 low=0 migproc=3
          * etc. etc. for all the diskpools.

   6. When the migrations are nearly complete, it's time to get ready
for the actual server downtime.
          * Update the event to predict the actual time of the downtime.
          * Disable sessions to the TSM server 'disable sessions'.
          * Cancel any running client sessions.
          * Cancel any 'expire inventory' processes.

   7. When the migrations are complete, you should be able to remove all
the disks from the diskpools.
          * q vol dev=disk
          * delete vol /dev/rstorage01
          * delete vol /dev/rstorage02 ... 

      NOTE: If you get any errors deleting the disks, you should try to
drain the data out of the storage pool again with the commands listed in
step 5.

   8. When all disk volumes have been completed, make an incremental
backup of the database to the NFS mount:
          * BACKUP DB dev=FILE ty=incr scratch=yes 

   9. Dismount any tape drives that are still mounted.
          * q mount
          * dismount vol VOLNAME

  10. Halt the production TSM services.
          * halt

  11. Tar up the /opt/tsm area on the production server, move it to the
new host and untar it.
          * tar -cvf opt.tar /opt/tsm/*
          * tar -xvf opt.tar 

  12. Tar up other critical TSM files and put them on the new server:
          * From the /usr/tivoli/tsm/client/ba/bin directory, you want
to copy the dsm.sys, dsm.opt and inclexcl* files over to the new host.
          * From the /usr/tivoli/tsm/server/bin directory, you want to
copy over the vol.hist1, and dev.config1 file.
          * From /, copy over the tsm.hints files and make changes to it
as needed. 

  13. Restore the incremental database backup to the new server and DO
commit the changes on the restore.
          * dsmserv restore db devclass=file vol=/mnt/52119599.dbb
commit=yes

  14. Go into DNS and change it so that the old TSM server addresses are
now aliases on the new TSM server.
            i.e. "botsmX" should have aliases of 'tsmhostX'.

  15. Once the restore is complete, you will need to upgrade the DB to
the current TSM server version with this command:
          * dsmserv upgradedb 

  16. At this point the TSM server may be complaining in the log
messages that it notices that the date on the server, if so you will
need to run this command to assure the TSM server that it did not
participate in any time travel:
          * ACCEPT DATE

  17. Make note of any errors you see in the activity log. You hopefully
will see none. If you see any, try to resolve them.

  18. Define the new SAN storagepool LVs to the new TSM server to
replace the ones you delete before.
          * q stg *disk*
          * define vol DB_DISKPOOL /dev/rstgvol01
          * define vol I_DISKPOOL /dev/rstgvol02
          * define vol A_DISKPOOL /dev/rstgvol03 ... 

  19. Update the threshholds on the disk storagepools so they will be
back to normal levels.
          * q stg *disk*
          * update stg DB_DISKPOOL hi=90 low=70
          * update stg I_DISKPOOL hi=90 low=70
          * update stg A_DISKPOOL hi=90 low=70
          * ... 

  20. If you are relatively sure that the TSM server is ready to go,
halt the server remove this line from the dsmserv.opt file
'DISABLESCHEDS YES' and restart the TSM server
          * vi /usr/tivoli/tsm/server/bin/dsmserv.opt 

  21. Copy over the entries in root's crontab and set them to run on the
new server.

  22. Eject the tape you used to restore the full DB backup from the
library:
          * mtlib -l /dev/lmcp0 -C -V VOLNAME -tFF10 

  23. Contact the Patrol group and have them put the server in the TSM
group so that the TSM KM will start to monitor the system
      NOTE: Make sure to tell them the host will inherit the settings of
the old production server.

  24. Monitor TSM server over the next few hours to make sure tapes are
being mounted and no errors are being reported. 

***************************

Good luck,
Ben


-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
Larry Peifer
Sent: Wednesday, October 04, 2006 11:01 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: TSM system migration planning

TSM system migration

We're in the beginning planning stages for a major transition in our TSM
configuration.  We've purchased all new hardware, Host and tape
libraries and disk sub-system; so, I'm  looking into the best practices
for making the transition with as little disruption to  ongoing
production operations as possible.  Any and all experiences from fellow
TSM'ers is  appreciated.

Backup window runs from 6pm to 6am.
Daily TSM Administration jobs, expiration / migration / dbbackups /
prepare runs from  6am-4pm.
Data to migrate totals about 25TBytes.

======EXISTING CONFIGURATION

All tapes are in libraries at all times - we have no tapes offline.

TSM Host:  One AIX p650 with 32G memory hosting a number of heavily used
Oracle databases,  TSM server and disk pools, and all tape libraries
connected via SAN fabric.

TSM Server version 5.3.1
2G fibre channel to Disks via SAN switch (zoned) is used for storage
pools and large raw  logical volumes for the Oracle databases.
1G fibre channel to 4 Tape Libraries via SAN switch (zoned)

Onsite tapes libraries
IBM 3583 with 4 SCSI LTO-1 drives and 40 tapes used for only AIX and
Oracle node data.
IBM 3583 with 6 fibre channel LTO-2 drives and 60 LTO-2 tapes only
Windows and Lotus node  data.

Remote tapes libraries
IBM 3583 with 4 SCSI LTO-1 drives and 40 tapes used for only AIX and
Oracle node data.
IBM 3583 with 6 fibre channel LTO-2 drives and 60 LTO-2 tapes used for
only Windows and  Lotus node data.

Once per day each Onsite tape stgpool is copied to the remote tape
stgpool via 'backup  stgpool' process.  Each library is a single
stgpool.

Clients:
95 Windows NT and 2000 servers with TSM 5.0, 5.02, 5.03 data access via
100M ethernet and  some Gb-ethernet.
10 TDP for Lotus Notes via Gb-ethernet
12 IBM AIX 5.3 MR4 with TSM 5.3.x via 1Gb-fibre channel
1  IBM AIX 4.3.3.x server with TSM 4.3.x via 100Mb ethernet
15 user managed Oracle Database backups via 1Gb-fibre channel
  (not using TDP for Oracle nor RMAN)


======NEW CONFIGURATION

TSM Host: AIX P520 with 8G memory only used for TSM backup server and
disk pools and all  tape libraries connected via new dedicated SAN
fabric.

SAN disks available for storage pool is 4.3T

TSM Server version: 5.3.most recent
4G Fibre Channel to storage pool Disks via SAN switch (zoned) 4G Fibre
Channel to 2 Tape Libraries via SAN switch (zoned)

Onsite tape library
IBM 3584 with 10 Fibre Channel LTO-2 drives and 140 tapes

Offsite tape library
IBM 3584 with 10 Fibre Channel LTO-2 drives and 140 tapes

Clients all stay the same with two major exceptions;
12 IBM AIX 5.3 MR4 nodes with TSM 5.3.x via Gb-Ethernet rather then
1Gb-Fibre Channel
15 user managed Oracle Database backups via Gb-Ethernet rather then
1Gb-Fibre Channel
  (not using TDP for Oracle nor RMAN)

Turn on Collocation? Since we want to maintain tape separation for
AIX/Oracle and  Windows/Lotus data it seems like at least 2 collocation
groups are need for the primary  sequential stgpool.

===============QUESTIONS

What is the easiest, fastest, least disruptive method to move data from
the 2 onsite tape  libraries to the 1 new onsite library?

One goal is to NOT disrupt the nightly backup window.  Deferring the
administrative window  would be ok.

The migration process could be done during 1 8-hour work window or it
could be done over any  number of days as long as daily backup and
recovery is still available.

It would be possible to connect one new 3584 with a few of its tape
drives active without salvaging nics and Gbics from the existing
configuration and thereby have both old and part of the new system
active in parallel.

Moving all onsite LTO-1 and LTO-2 tapes to the new 3584 library is
another option; and then  aging out the LTO-1's via 'move data' perhaps.

Any methods and trade-offs for this would be appreciated.

And then there is the issue of getting our data from the primary
sequential storage pools to  the copy storage pools in the offsite
library.  I figure to just use 'backup storage pool'  to make that
happen.
 The 2 libraries are connected via 2 4Gb-Fibre Channel SAN switches.
The
question is how will collocation on the primary affect this and will
using many processes  help or hurt future recovery processing from the
copy pools?

Thanks for your time,
Larry Peifer
AIX / Oracle / TSM System Administrator
San Clemente, California

<Prev in Thread] Current Thread [Next in Thread>