Disaster Recovery of a Large Environment

tsmadm

ADSM.ORG Member
Joined
Jan 2, 2003
Messages
70
Reaction score
0
Points
0
Website
Visit site
We are in the process of developing a DR strategy and there has been a concern raised about the number of tapes needed for DR. Currently we are planning for recovery of 262 servers in 3 days. We have averaged the number of tape mounts needed for each restore which is 37. We are using 9940B drives (200GB / 30MB/sec native) The concerns are listed below:



1) The number of tape mounts needed

2) Tape contention---only one node can be restored from one drive at any time



We have considered backup sets but this only provides a point-in-time recovery from the last backup set. We have also considered collocation but due to the number of tape mounts required for a stgpool backup / migration this is not feasible.



I would appreciate any ideas from someone who has tested a recovery to this scale and their experience. Or at this point, any ideas will be helpful.



This seems to be a real weakness with TSM.



Thanks,
 
We have a similar issue with our DR testing, about 80 server to recover in 24hrs using TSM. Our configuraiton consists of a 3494 library, with 3590-E1A tape drives <Qty 8 > managing 600 clinets.



There several factors which will determine your success at DR testing, in no particular order they are as follows:



1) The current state of your tapes will be a big factor. Specifically, data spread per client will be the challenge. This SQL statement will give you a feel of how deep the issue may be.



select NODE_NAME as NODE,count(distinct VOLUME_NAME) as "Number of Tapes Used"

from VOLUMEUSAGE

where NODE_NAME=<Nodename> and STGPOOL_NAME=<STG Pool ame>

group by NODE_NAME



2) Based upon the results of the statement above you may decide to do selective colocation on some of your critcal clients. What I mean is that you may have to have some clients use seperate STG pools which have colocation turned on. As I said I have a total of 600 clients, but it is not practical for me to turn colocation on for all of them, so I do it selectivily ( based upon need and policy domain). TSM make recovery a case of pay me now or pay me later........



3) Be sure that you DR contract has enough tape drives included in it. Perferably more that you actually have at your home office. This will help because as long as there is a free tape drive the mount/dismount times will be minimized. In other words a tape can be mounting while another is dismounting, cuts down on the waiting.



4) Depending upon the version of TSM your using the 'move nodedata...' command might be of some use when you know in advance that a server will need to be restored. Doing the consoladation on the server side is much faster than waiting for tape mounts on the client side.



I hope this help a little.......
 
Back
Top