ISP - Node Replication hangs

kslztc

ADSM.ORG Member
Joined
Mar 2, 2017
Messages
17
Reaction score
0
Points
0
Hi,

We have a TSM server running ISP 8.1.1.0 on WS2016, containing approx 5-600 nodes - Exchang, File, SQL, etc

Each day we replicate all the nodes filespaces, from the primary to a secondary server.
Within the last week a nodes filespace started to stalling the TSM server - blocking our daily script to continue.

The node is an Exchange server containing approx 4TB of data, with a 1 month retention.
Ive queried replication for the node, and from what ive could see, the replication should have finished - all the files are replicated, but the filespace replication is still in "Incomplete" state.

Ive attached 2 pictures of the q replication command - hopefully someone can tell me what is going wrong :)

Any advice would be greatly appreciated!
 

Attachments

  • replication-issue01.PNG
    replication-issue01.PNG
    220.4 KB · Views: 16
  • replication-issue02.PNG
    replication-issue02.PNG
    50 KB · Views: 13
Hi,

As you are just seeing one node look like it fails or is skipping objects during Replicate Node, you may want to remove it from the Node Group and just run it by itself. If you post the actlog messages at the end of this node's completion of replication, we may be able to see what exactly is happening.
 
Inthesun's suggestion to isolate that node is a good idea as well. Is that node your largest node? If so, how much larger when compared to the 2nd largest node? If unsure, you can use this query to see your top 5 largest nodes:
Code:
select node_name,sum(reporting_mb) as MB from occupancy where node_name!='' group by node_name order by sum(reporting_mb) desc fetch first 5 rows only

Sometimes, what appears to be a hang could be performance problem and it's processing slow enough that it gives the appearance to be hung. To determine if it's hung or slow, use QUERY PROCESS and QUERY SESSION every 20 minutes for a few hours and see if the numbers for the process and session to the target server climb, if the bytes or objects change, it's not hung and you're probably looking at a performance issue.

Is the data in a container pool or a traditional pool? If the former, you should do PROTECT STGPOOL before REPLICATE NODE. Protect is more efficient at copying the data to the target and replicate then just needs to replicate the metadata. Ignore this paragraph if you are not using container pool.
 
Back
Top