ISP - Node Replication hangs

kslztc · Aug 30, 2017

Hi,

We have a TSM server running ISP 8.1.1.0 on WS2016, containing approx 5-600 nodes - Exchang, File, SQL, etc

Each day we replicate all the nodes filespaces, from the primary to a secondary server.
Within the last week a nodes filespace started to stalling the TSM server - blocking our daily script to continue.

The node is an Exchange server containing approx 4TB of data, with a 1 month retention.
Ive queried replication for the node, and from what ive could see, the replication should have finished - all the files are replicated, but the filespace replication is still in "Incomplete" state.

Ive attached 2 pictures of the q replication command - hopefully someone can tell me what is going wrong

Any advice would be greatly appreciated!

inthesun · Sep 14, 2017

Hi,

As you are just seeing one node look like it fails or is skipping objects during Replicate Node, you may want to remove it from the Node Group and just run it by itself. If you post the actlog messages at the end of this node's completion of replication, we may be able to see what exactly is happening.

marclant · Sep 15, 2017

Inthesun's suggestion to isolate that node is a good idea as well. Is that node your largest node? If so, how much larger when compared to the 2nd largest node? If unsure, you can use this query to see your top 5 largest nodes:

Code:

select node_name,sum(reporting_mb) as MB from occupancy where node_name!='' group by node_name order by sum(reporting_mb) desc fetch first 5 rows only

Sometimes, what appears to be a hang could be performance problem and it's processing slow enough that it gives the appearance to be hung. To determine if it's hung or slow, use QUERY PROCESS and QUERY SESSION every 20 minutes for a few hours and see if the numbers for the process and session to the target server climb, if the bytes or objects change, it's not hung and you're probably looking at a performance issue.

Is the data in a container pool or a traditional pool? If the former, you should do PROTECT STGPOOL before REPLICATE NODE. Protect is more efficient at copying the data to the target and replicate then just needs to replicate the metadata. Ignore this paragraph if you are not using container pool.

ISP - Node Replication hangs

kslztc

Attachments

inthesun

marclant

Data Privacy Impact Assessment

Sponsor ADSM.ORG

Navigation Menu

NordVPN 3 Months FREE

Forum statistics