• Please help support our sponsors by considering their products and services.
    Our sponsors enable us to serve you with this high-speed Internet connection and fast webservers you are currently using at ADSM.ORG.
    They support this free flow of information and knowledge exchange service at no cost to you.

    Please welcome our latest sponsor Tectrade . We can show our appreciation by learning more about Tectrade Solutions
  • Community Tip: Please Give Thanks to Those Sharing Their Knowledge.

    If you receive helpful answer on this forum, please show thanks to the poster by clicking "LIKE" link for the answer that you found helpful.

  • Community Tip: Forum Rules (PLEASE CLICK HERE TO READ BEFORE POSTING)

    Click the link above to access ADSM.ORG Acceptable Use Policy and forum rules which should be observed when using this website. Violators may be banned from this website. This notice will disappear after you have made at least 3 posts.

[DEFECT] SP 8.1.4 Server to Server Sessions

ILCattivo

ADSM.ORG Senior Member
#1
Happy New Year all.

So I have 2 x SP 8.1.4 servers located at different DC locations. 1 x Windows 2016 & 1 x RHEL 7

Both of these servers Protect each others Container Storage Pools via the Protect STG and Replicate Node cmds daily. Maxsessions=30 for the Storage Pool Protection process.

Today I have discovered literally 100's of 'RecvW' SSL server sessions going back 200+ hours on one of the destination servers from the other source server? But not the other way round..?

Admittedly the first Protect STG sync was a good 8+TB in size over the WAN link and took a few days to complete, but since then we have these orphaned RecvW' server sessions hanging around and they're simply not shifting once the protection of the storage pool is successful each day?

Something I am missing here or potentially another bug?

Thanks
 

ILCattivo

ADSM.ORG Senior Member
#2
I can confirm that having looked at an identical setup of mine with 2 x 8.1.0 SP Servers , both on win 2012, this behaviour is not present.!

By the way, this odd orphaned session behaviour as described in the OP is Source [win2k16] > Destination [RHEL7] both SP 8.1.4
 

marclant

ADSM.ORG Moderator
#3
RecvW (receive wait) is normally due to something at the networking layer. A transaction is in progress, but the receiving side suddenly has to wait mid-transfer. This situation "normally" stops when either the transmission continues or when the commtimeout is reached (60 seconds by default).
 

ILCattivo

ADSM.ORG Senior Member
#4
RecvW (receive wait) is normally due to something at the networking layer. A transaction is in progress, but the receiving side suddenly has to wait mid-transfer. This situation "normally" stops when either the transmission continues or when the commtimeout is reached (60 seconds by default).
Yep, can confirm that the COMMTIMEOUT setting is set to the default '60' so as you say, in theory the sessions should have stopped.

Interestingly on my 8.1.0 setup the COMMTIMEOUT is a lot lot bigger.
 

ILCattivo

ADSM.ORG Senior Member
#6
Have you ever solved the problem?
Ah ha.. Do we have someone else with the same issue here?

I have had a PMR open with IBM with this for months now. All sorts of traces put in place and the only thing they can determine is network issues, causing the protect stg pool sessions to disconnect abruptly, in the direction of my Windows ISP Server > Linux ISP Server. Strange however that the issue is not happening in the other direction over that same VPN tunnel??

We have upped network traffic timeouts on our current firewalls at both ends to try and counter this but to no avail. The issue still persists?? Win > Lnx.

We are due to upgrade our firewall hardware at both ends in the coming months, while also increasing the bandwidth capacity between the two, so if that doesn't sort it then it's deffo an issue with the underlying code between ISP between the two different platforms!!

Watch this space...
 

Mita201

ADSM.ORG Senior Member
#7
I have similar problem, orphan replication sessions that stays on receiving side even if they are closed on sending side.
I have two Windows 2016 TSM servers, upgraded to 8.1.5 since opening PMR (started with 8.1.4)
So, not sure if it hes something with different platforms.
I have only one direction replication, so I can't tell if same should happen other way.
I will post if there are some findings with my PMR.
 

ILCattivo

ADSM.ORG Senior Member
#8
I have similar problem, orphan replication sessions that stays on receiving side even if they are closed on sending side.
I have two Windows 2016 TSM servers, upgraded to 8.1.5 since opening PMR (started with 8.1.4)
So, not sure if it hes something with different platforms.
I have only one direction replication, so I can't tell if same should happen other way.
I will post if there are some findings with my PMR.
Ok, that's good that you have PMR open with what looks like an identical case.
The server OS that is sending the ISP 8.1.4.2 Protect Stg & replication data to my RHEL 7 ISP 8.1.4.2 server is also 'Windows Server 2016.' <--- Currently a common denominator between our two cases.

Tell you what i'll do.. Might be worth posting my PMR ref no here so if you want you can also provide it to them as a similar reference to your case.

PMR 30728,999,866

The chap I was dealing with @ Lvl 2 Support was Dave Border. [IBM UK]

I have my suspicions this is not network related if these kinds of issues are becoming more prevalent in ISP 8.1.4/5 between 2 replicating servers where at least one is coming from a Windows 2016 Server.

Most of the time taken to identify, what they believe to be the cause, will be the numerous trace files they require. Can take weeks!!
 

Mita201

ADSM.ORG Senior Member
#9
Hi,

My case is 07208,707,707 and I am still with L3 (opened on May 23th)
My guy told me he has passed my servmon.pl outputs to L2 few days ago. I will post how it goes.
 

ILCattivo

ADSM.ORG Senior Member
#10
Hi,

My case is 07208,707,707 and I am still with L3 (opened on May 23th)
My guy told me he has passed my servmon.pl outputs to L2 few days ago. I will post how it goes.
Ah yes, been through those too..
I am willing to bet they will come back with the finger pointing at your network like they did with me?

Not sure if you are using it yet where you are, but here in the UK we no longer use the old PMR system. Its now a self service desk dashboard with 'Cases' instead of PMR's. Much better and quicker to manage.

Yep, keep us updated with progress, cause mine just hit a brick wall when they said it was a Network timeout issue!!
 

Mita201

ADSM.ORG Senior Member
#11
Yes, I am using new interface too, but I can see "Legacy case number" as well, when I have opened it by mailing to IBM.
No news here, just an issue after upgrade to ISP 8.1.5 - the ISP DB Backup will not work any more until you upgrade ISP client on the ISP server to 8.1.4.1 (newer GSKIT is needed). So I have had another, forked case.
 

moon-buddy

ADSM.ORG Moderator
#14
Yep, can confirm that the COMMTIMEOUT setting is set to the default '60' so as you say, in theory the sessions should have stopped.

Interestingly on my 8.1.0 setup the COMMTIMEOUT is a lot lot bigger.
Yes, the sessions should have timed out - this really points to a code defect. Given that your 8.1.0 works but not 8.1.4 leads me to this conclusion.
 

ILCattivo

ADSM.ORG Senior Member
#15
Hi
I wonder if you have got any solutions to your replication problems yet?
As others have said, not yet no!!

Not for my case. We are on some hold, waiting for hw upgrade. I will post news when there are any
Me too, IBM started to point the blame at our Network switching / firewalling and bandwidth...??
The switching is soon to be upgraded to the latest and greatest in the coming month or two, so we shall see hey!

Yes, the sessions should have timed out - this really points to a code defect. Given that your 8.1.0 works but not 8.1.4 leads me to this conclusion.
The 8.1.0 system in my case is Windows > Windows.
I suspect, as you do, it's a code defect in ISP 8.1.4 from Windows > Linux as when Protect STG & Replicate Node run from Linux > Windows.. Guess what... it works fine and the sessions close cleanly..
 

Advertise at ADSM.ORG

If you are reading this, so are your potential customer. Advertise at ADSM.ORG right now.

UpCloud high performance VPS at $5/month

Get started with $25 in credits on Cloud Servers. You must use link below to receive the credit. Use the promo to get upto 5 month of FREE Linux VPS.

The Spectrum Protect TLA (Three-Letter Acronym): ISP or something else?

  • Every product needs a TLA, Let's call it ISP (IBM Spectrum Protect).

    Votes: 9 20.0%
  • Keep using TSM for Spectrum Protect.

    Votes: 24 53.3%
  • Let's be formal and just say Spectrum Protect

    Votes: 8 17.8%
  • Other (please comement)

    Votes: 4 8.9%

Forum statistics

Threads
31,082
Messages
132,339
Members
21,279
Latest member
Hanh
Top