Replication - how should it be done?

droach

ADSM.ORG Senior Member
Joined
Jan 7, 2008
Messages
239
Reaction score
13
Points
0
Location
Cut and Shoot, Texas
PREDATAR Control23

Been reading up on replication and I am a little fuzzy on what is required and what is the best practice. For my first pair of source/target servers I am running SP v8.1.5 and using Directory storage pools. Using OC I defined the replication pair and I added the command 'REPLicate Node * Wait=Yes' to my daily schedule. All seems fine and replication and failover are working.
However, when I look at the details of my storage pool on my source server there is nothing set for 'Protection Storage Pool'. Without this setting I cannot use the command 'PROTect STGPool'.

So my first question is, should I be using 'PROTect STGPool' to replicate my pool, or is the way OC set it up OK?

For my second question:
Have a second pair of servers but these are running v8.1.6.100. On this pair I created a Cloud type storage pool on the source/target. This time, when I use OC to set up the replication pair I get the warning below. So, my question here is; why is OC warning me to use "storage pool protection" when it didn't set that up for my other server pairs?

1548948349331.png

Any guidance is appreciated.
 
PREDATAR Control23

First, it's important to understand the difference between protect and replicate:
- protect stgpool only protects the pool, it sends all the extents not previously protected to the other server. So, with this, you can repair your storage pool if you need to, but you cannot restore nodes from the target server.
- replicate node does two things, it sends extents that are not previously protected to the target, and it also sends the metadata to the target. So after replication is completed, a node could restore from the target

So in short:
- protect only data
- replicate data + metadata

The preferred method is to run protect first, followed by replicate node. That's because protect is more effective at moving the data to the target. Then when you run the replication, it only has to handle the metadata. So protect+replicate normally run quicker than if you let replicate node handle it all.

I don't have an answer for your 2nd question, I'd have to research, but no time to do so today. I may come back to it later.
 
PREDATAR Control23

Thanks for the explanation Marclant. I updated my source storage pool with the name of the pool on my destination server and kicked off a 'Protect Stgpool'. I also added the command to my Daily Task schedule, just prior to the 'replicate node * ' command.
 
PREDATAR Control23

@droach would it be possible for you to share your commands you are using to both protect the stgpools and nodes in your daily script/schedule please ? I'm in the same boat as far as your fist question, your dialogue with @marclant has helped me but it would be nice to see your commands if that would be possible ?

Kind Regards
Craig
 
PREDATAR Control23

I recently met with an IBM 'Spectrum Storage Technical Specialist' and he confirmed what marclant said. Run 'Protect STGP' first, followed by 'replicate node'. The commands I use in my daily schedule are:

PROTect STGPool YOURPOOLNAMEHERE Wait=Yes
REPLicate Node * Wait=Yes
 
PREDATAR Control23

Thanks, how did you define the stgpool to allow the first command to work ?
 
PREDATAR Control23

Interesting discussion, I will add few questions:
I set up a replication environment and just did node replication, source pool is a VTL 7650 and destination pool is a container pool:
  1. Is it possible to estimate speed increment for replication operations moving from a "replication" config to a "protect + replicate" config?
  2. I used nodegroup to split production nodes from test/quality/dev ones, all of them write data on the same pool (ORA, SAP, etc), should I split data pool too?
  3. Can I switch from "just replication" to "protect + replica" config or should I start from scratch?
Thanks.
 
PREDATAR Control23

Is it possible to estimate speed increment for replication operations moving from a "replication" config to a "protect + replicate" config?
You cannot protect, protect only works if the source and target are both using container pools, you can only replicate in your case.
I used nodegroup to split production nodes from test/quality/dev ones, all of them write data on the same pool (ORA, SAP, etc), should I split data pool too?
It depends. From an SLA perspective, if you need some nodes to be available before others on the target, it may make sense to replicate those first. If you don't have SLAs, then it's much simpler to replicate all the nodes. If you don't need/want to replicate certain nodes, update those node with "replstate=disabled".
 
PREDATAR Control23

@droach how does your protect stgpool and replicate node commands fit in your daily maintenance script ? When do you do your tsm db backup in relation to these commands ? Before or after, I am have having issues getting my protect and replicate commands to run as pert of my script, could you share your maintenance script so i can try and figure out whats wrong with mine ? I've posted about this but no reply as yet. Link to thread here --> Weird Maintenance Script Issue

Craig
 
PREDATAR Control23

Here is my Daily Task script. Not saying it is right for you, but works for me.

DLY_TASKS 1 /* TSM Daily Admin Tasks */
5 query script CANCEL
10 if (RC_OK) exit
15 BAckup DB DEVclass=db_backup Type=Full Scratch=Yes Wait=Yes
20 query script CANCEL
25 if (RC_OK) exit
30 Prepare Source=DBBackup Wait=Yes
40 BAckup DEVCONFig Filenames=L:\TSM\DEVCONFIG\devconf.out
45 BAckup VOLHistory Filenames=L:\TSM\DEVCONFIG\volhist.out
50 query script CANCEL
55 if (RC_OK) exit
60 MOVe DRMedia * WHERESTate=VAULTRetrieve TOSTate=ONSITERetrieve Wait=Yes
65 MOVe DRMedia * WHERESTate=MOuntable TOSTate=VAult Wait=Yes
70 query script CANCEL
75 if (RC_OK) exit
80 PROTect STGPool AZUREFILE MAXSESSions=20 Wait=Yes
85 query script CANCEL
90 if (RC_OK) exit
95 REPLicate Node * MAXSESSions=20 Wait=Yes
100 query script CANCEL
105 if (RC_OK) exit
110 EXPIre Inventory Quiet=Yes Type=ALl REsource=20 SKipdirs=No Wait=Yes
 
PREDATAR Control23

Thanks for that @droach i will compare to mine and see is that resolves my issues, how do you know if your protect and replicate commands have worked ? Can i also ask what the "query script CANCEL" is doing ?
Craig
 
PREDATAR Control23

For me, you know if it worked because the PROTECT and REPLICATE jobs run forever. Just do a 'q process', or 'q actl' for the time the command was issued. You can also do a 'q node yournodenamehere f=d' and look at all of the 'Replication' settings. Those will tell you IF the node is configured for replication, and when it 'Last Replicated to Server'.

The 'query script CANCEL' is a way cancel the Daily Tasks script once it has started. If I need to cancel the running script all I have to do is issue the command "define script CANCEL". Then I can either let the currently running process complete, or I can kill or cancel the process that is currently running. When that process completes or dies the next command in the Daily Tasks scripts checks to see if the 'CANCEL' script exists. If it does exist the script exits. If it doesn't exist it continues to run the next command in the Daily Task script.

Just have to remember to delete the CANCEL script (if you created it) or the Daily Task script will exit at line 10 the following day :)
 
PREDATAR Control23

Brilliant thanks @droach :)
One more sort of related thing, can you run a normal windows batch file from a tsm script, is there a way to do this ?

Craig
 
PREDATAR Control23

I don't think that there is a way to run a batch file from a TSM script. However, you easily go the other way and run DSMADMC commands from a batch file or PowerShell script.

So you have to get a little tricky depending on what you are trying to accomplish. If you want to run a batch file when your Daily Task script gets to a certain place in its execution you can have a line in your Daily script that creates (defines) a bogus script. Then, from your batch file or PowerShell script have it polling/looping and querying for the existence of this bogus script. When the bogus script finally exists have your batch or PowerShell script break out of its loop to do whatever you want.

Clunky, yes.
 
PREDATAR Control23

I noticed while playing around with PROTECT STGPOOL and REPLICATE NODE that if you have your node's data in a directory-container pool called DEDUPEPOOL which is set up with
PROTECTstgpool=TARGET-DEDUPEPOOL when you run PROTECT STGPOOL the data is copied from DEDUPEPOOL on one Spectrum Protect server to TARGET-DEDUPEPOOL on the destination Spectrum Protect server as expected. But when you run REPLICATE NODE the 'remaining' data (metadata and whatever is missing) is sent to the DEDUPEPOOL on the destination Spectrum Protect server. In fact if you delete DEDUPEPOOL on the destination server the REPLICATE NODE command fails. So you have data in 2 container pools on the destination server. Is this expected behaviour ?
 
PREDATAR Control23

So you have data in 2 container pools on the destination server. Is this expected behaviour ?
I believe so. If you don't want the data to be in 2 pools, send it to the same pool. So, use PROTECTstgpool =DEDUPEPOOL.
 
PREDATAR Control23

I believe so. If you don't want the data to be in 2 pools, send it to the same pool. So, use PROTECTstgpool =DEDUPEPOOL.
Thanks. After seeing that using these two container pools essentially doubles the amount of storage used on the replication server I will definitely be using just the one pool.
 
Top