• Please help support our sponsors by considering their products and services.
    Our sponsors enable us to serve you with this high-speed Internet connection and fast webservers you are currently using at ADSM.ORG.
    They support this free flow of information and knowledge exchange service at no cost to you.

    Please welcome our latest sponsor Tectrade . We can show our appreciation by learning more about Tectrade Solutions
  • Community Tip: Please Give Thanks to Those Sharing Their Knowledge.

    If you receive helpful answer on this forum, please show thanks to the poster by clicking "LIKE" link for the answer that you found helpful.

  • Community Tip: Forum Rules (PLEASE CLICK HERE TO READ BEFORE POSTING)

    Click the link above to access ADSM.ORG Acceptable Use Policy and forum rules which should be observed when using this website. Violators may be banned from this website. This notice will disappear after you have made at least 3 posts.

replication failure

Lars-Owe

ADSM.ORG Member
#1
Hi!

A couple of our nodes are experiencing replication failures. query replication for the affected file systems shows:
...
Backup Files Not Replicated Due To Errors: 1
...

What is the typical action here? The client is a Windows machine, c$ being the culprit file system with 35307 files. Source and target backup servers are running Spectrum Protect 7.1.7.0. Running the replication process again gives consistent results, but not much help:

2017-04-17 21.14.51 ANR0984I Process 312 for Replicate Node started in the
FOREGROUND at 21:14:51. (SESSION: 75622, PROCESS: 312)
2017-04-17 21.14.51 ANR2110I REPLICATE NODE started as process 312. (SESSION:
75622, PROCESS: 312)
2017-04-17 21.14.51 ANR0408I Session 75642 started for server TSM5 (AIX)
(Tcp/Ip) for replication. (SESSION: 75622, PROCESS: 312)
2017-04-17 21.14.52 ANR0408I Session 75643 started for server TSM5 (AIX)
(Tcp/Ip) for replication. (SESSION: 75622, PROCESS: 312)
2017-04-17 21.14.52 ANR0408I Session 75644 started for server TSM5 (AIX)
(Tcp/Ip) for replication. (SESSION: 75622, PROCESS: 312)
2017-04-17 21.14.52 ANR3192I Replicate Node: Proxy agent nodes replicated: 0
of 0 identified. Associated authorized nodes replicated:
0 of 0 identified. Client option sets replicated: 0 of 0
identified. (SESSION: 75622, PROCESS: 312)
2017-04-17 21.14.52 ANR0327I Replication of node SCAR008A.MEB.KI.SE
completed. Files current: 37,184. Files replicated: 0 of
1. Files updated: 0 of 0. Files deleted: 0 of 0. Amount
replicated: 0 bytes of 0 bytes. Amount transferred: 0
bytes. Elapsed time: 0 Days, 0 Hours, 1 Minutes.
(SESSION: 75622, PROCESS: 312)
2017-04-17 21.14.52 ANR0987I Process 312 for Replicate Node running in the
FOREGROUND processed 37,184 items with a completion
state of FAILURE at 21:14:52. (SESSION: 75622, PROCESS:
312)
2017-04-17 21.14.52 ANR1893E Process 312 for Replicate Node completed with a
completion state of FAILURE. (SESSION: 75622, PROCESS:
312)
2017-04-17 21.16.09 ANR2017I Administrator LARS-OWE issued command: QUERY
ACTLOG search='Process: 312' (SESSION: 75622)
 

inthesun

ADSM.ORG Member
#2
Hi,

From your update, you state more then one node is failing Node Replication. Are they being processed as a group? or are you processing one at a time? The above looks like the process failed due to an error when a group of nodes are being replicated. There may have been a communication failure to the target for one of the other nodes and the above messages are just reporting that one of the Nodes sent no data as the process failed.

If these nodes are in a container pool, are your Protect STGpool commands processing successfully before you do the Node Replication?

If you are unable to find the original failure, then you may want to open a ticket with IBM and have them review the full Actlogs, from the source and target systems, during the full time Node Replication is running.
 

Lars-Owe

ADSM.ORG Member
#3
Protect stgpool is running successfully. We've removed the two nodes having troubles from the node groups being replicated. The above log extract comes from the replication of a single file system (c$) on one of the two affected nodes.
 

marclant

ADSM.ORG Moderator
#4
Protect stgpool is running successfully. We've removed the two nodes having troubles from the node groups being replicated. The above log extract comes from the replication of a single file system (c$) on one of the two affected nodes.
Also check the activity log of the target server, the failure can be caused as much on the target as the source.

The ffdc.log located in the instance directory may also have additional information, again check both the source and target.
 

Lars-Owe

ADSM.ORG Member
#5
There's nothing spectacular going on at the target server:

2017-04-18 20.36.11 ANR0408I Session 580642 started for server TSM4 (AIX)
(Tcp/Ip) for replication. (SESSION: 580642)
2017-04-18 20.36.11 ANR0950I Session 580636 for node VM_ITS_IT-DCN01 is using
inline server data deduplication or inline compression.
(SESSION: 580636)
2017-04-18 20.36.11 ANR0984I Process 1679 for Replicate Node ( As Secondary )
started in the BACKGROUND at 20:36:11. (SESSION: 580642,
PROCESS: 1679)
2017-04-18 20.36.11 ANR2110I Replicate Node ( As Secondary ) started as
process 1679. (SESSION: 580642, PROCESS: 1679)
2017-04-18 20.36.11 ANR2071I Administrator SCAR008A.MEB.KI.SE updated.
(SESSION: 580642, PROCESS: 1679)
2017-04-18 20.36.11 ANR0408I Session 580643 started for server TSM4 (AIX)
(Tcp/Ip) for replication. (SESSION: 580643)
2017-04-18 20.36.11 ANR0408I Session 580644 started for server TSM4 (AIX)
(Tcp/Ip) for replication. (SESSION: 580644)
2017-04-18 20.36.12 ANR0950I Session 580638 for node VM_ITS_IT-DCN01 is using
inline server data deduplication or inline compression.
(SESSION: 580638)
2017-04-18 20.36.13 ANR0409I Session 580642 ended for server TSM4 (AIX).
(SESSION: 580642, PROCESS: 1679)
2017-04-18 20.36.13 ANR0409I Session 580644 ended for server TSM4 (AIX).
(SESSION: 580644)
2017-04-18 20.36.13 ANR0409I Session 580643 ended for server TSM4 (AIX).
(SESSION: 580643)

I tried a protect stg contpool forcereconcile=yes, and it too ran successfully.

The ffdc logs are primarily made up of:
[04-18-2017 06:01:37.473][ FFDC_GENERAL_SERVER_ERROR ]: (sddelete.c:2112) Unable to delete non-dedup chunkId -5537312171585819799

According to an APAR I found this is harmless and should be ignored. It did also state:

[04-18-2017 08:06:55.462][ FFDC_GENERAL_SERVER_ERROR ]: (imdmgr.c:3700) Column 14 in table Archive.Objects is NULL.~
[04-18-2017 08:07:24.744][ FFDC_GENERAL_SERVER_ERROR ]: (imdmgr.c:3700) Column 14 in table Archive.Objects is NULL.~

The node I replicated has no archive data, only backup.
 

inthesun

ADSM.ORG Member
#6
The next thing you can do to see if it reports a problem, on the source server, is to do an AUDIT CONTAINER STGPOOL=<pool_name> ACTION=SCANALL . Here is the link to the full command doc:
https://www.ibm.com/support/knowledgecenter/en/SSEQVQ_8.1.0/srv.reference/r_cmd_container_audit.html

If that finds nothing and as you have provided the logs, that are not reporting what the failure is -- like a damaged extent or orphaned extent, then you should get IBM support to look deeper. They should get traces of the failed node replication process that fails within a minute.

I hope this is helpful.
 

Advertise at ADSM.ORG

If you are reading this, so are your potential customer. Advertise at ADSM.ORG right now.

UpCloud high performance VPS at $5/month

Get started with $25 in credits on Cloud Servers. You must use link below to receive the credit. Use the promo to get upto 5 month of FREE Linux VPS.

The Spectrum Protect TLA (Three-Letter Acronym): ISP or something else?

  • Every product needs a TLA, Let's call it ISP (IBM Spectrum Protect).

    Votes: 8 23.5%
  • Keep using TSM for Spectrum Protect.

    Votes: 17 50.0%
  • Let's be formal and just say Spectrum Protect

    Votes: 5 14.7%
  • Other (please comement)

    Votes: 4 11.8%

Forum statistics

Threads
30,926
Messages
131,573
Members
21,207
Latest member
Nur03