Protect and Replicate wont run

illllm · Mar 24, 2018

A few days ago, we had emergency maintenance on our storage array for a different issue. We had to halt TSM. We stopped all replications that were running and issued the halt command. After 1 hour and 30 mins, the dsmserv service was still running so we had to kill it. Now we see that protect stage is not moving data. Tried the forcereconcile option but it does not work. Any suggestions of experiences in this would be great help. IBM support is of no help as they are 8 to 5:30 on weekdays only and every time I upload logs, they take forever to respond. So its one response a day while our data to replicate is piling up at 100 TB a day. Has anyone had issues with replication?

Trident · Mar 25, 2018

Hi,
Not alot to work on. What version is running? Any output from actlog that can give us a clue. Any entries in dsmserv.err file?

There is a lot of bug fixes. But, if you are working with IBM, you shoud not muddy the waters with upgrading your system to a newer release.

illllm · Mar 25, 2018

ANR0985I Process 1600 for Replicate Node ( As Secondary ) running in the BACKGROUND completed with completion state FAILURE.

this is the only message.

TSM 8.1.1

DazRaz · Mar 25, 2018

What about the activity log on the source server? What do you get in the log when you run a protect command? Do you still have server to server communication? (try running a command remotely)

Replication doesn't seem to provide good info the the log and I've had more success working through the Operations Center. Does that give any more information?

illllm · Mar 25, 2018

That is the from the source. The destination logs have nothing in them. All other replications run fine. I suspect there are a few corrupt containers and TSM does not know how to handle them.

DazRaz · Mar 25, 2018

ANR0985I Process 1600 for Replicate Node ( As Secondary ) running in the BACKGROUND completed with completion state FAILURE.

The "As Secondary" means that this entry in the log was on the target server.

This link might be of help to repair the corrupt containers - http://www.tsmtutorials.com/2017/01/repair-damaged-data-on-target-server.html

illllm · Mar 27, 2018

ANR8213E Socket 20 aborted due to send error; error 32

this is the error on the Source logs

marclant · Mar 27, 2018

illllm said:
ANR8213E Socket 20 aborted due to send error; error 32

ANR8213E:

ANR8213E (Linux) Socket Socket identifier aborted due to send error; error error code.

Explanation
The session between the server and the specified client system experienced a fatal error sending data.
System action
The session with the remote system is ended.
User response
Ensure that the specified remote system is operational and is properly configured to run TCP/IP.

Error 32 means "broken pipe".

Sounds like a networking problem between the source and target.

illllm · Mar 28, 2018

Mar 25, 2018, 10:51:21 AM ANR0986I Process 629 for Replicate Node running in the FOREGROUND processed 426,531 items for a total of 2,065,542,681,043 bytes with a completion state of FAILURE at 10:51:21 AM. (SESSION: 173076, PROCESS: 629)
Mar 25, 2018, 10:51:21 AM ANR1893E Process 629 for Replicate Node completed with a completion state of FAILURE. (SESSION: 173076, PROCESS: 629)

marclant · Mar 28, 2018

illllm said:
Mar 25, 2018, 10:51:21 AM ANR0986I Process 629 for Replicate Node running in the FOREGROUND processed 426,531 items for a total of 2,065,542,681,043 bytes with a completion state of FAILURE at 10:51:21 AM. (SESSION: 173076, PROCESS: 629)
Mar 25, 2018, 10:51:21 AM ANR1893E Process 629 for Replicate Node completed with a completion state of FAILURE. (SESSION: 173076, PROCESS: 629)

What you captured is the final status of the process. The cause of the failure is above that somewhere in the activity log, look for any errors for PROCESS: 629 prior to Mar 25, 2018, 10:51:21 AM

DazRaz · Mar 28, 2018

Use the Operations Center to find the nodes it has had problems with. (Not sure the version you need for this, works for me on 8.1.1)

From the Menu, Storage, Replication. Select the line with the Source/Target which is failing and click on details. The details screen will show the failed jobs which you can click on and it will show the nodes which are failing.

illllm · Mar 29, 2018

Its failing only on one node. Protect works fine. Replicate just hangs and does nothing. IBM support is also stumped as logs do not show anything.

marclant · Mar 29, 2018

It's semantics, but if replication fails, then it doesn't hang. By definition, an hang never completes, you have to kill it.

If you already have IBM engaged, that's likely your best course of action at this point.

illllm · Mar 29, 2018

Thats exactly what happens. Replication starts and then for 24 hours it just sits there doing nothing. No network throughput, no active threads, sessions are inactive for same amount of time.

Protect and Replicate wont run

illllm

Trident

TSM/Storge dude

illllm

DazRaz

illllm

DazRaz

illllm

marclant

illllm

marclant

DazRaz

illllm

marclant

illllm

Data Privacy Impact Assessment

Sponsor ADSM.ORG

Navigation Menu

NordVPN 3 Months FREE

Forum statistics