TCP/IP Communication Problems

chalkd

ADSM.ORG Member
Joined
Nov 22, 2006
Messages
63
Reaction score
0
Points
0
PREDATAR Control23

Every second or third day I receive the following error message from one of our TSM Servers:

ANS1017E Session rejected: TCP/IP connection failure
ANS8023E Unable to establish session with server.
ANS8002I Highest return code was -50.

Some days when it happens I get a lot of failures other days I would not. I do however get a lot of In Progress and sometime there are stilll some sessions running.

If I log on to a client configured on this TSM Server and start up the BA Client I get a TCP/IP Communication failure even though I can ping the TSM Server in question from the client with success.

I need to resart the TSM Server service (DSMSERVE.EXE) on the server everytime this happens in order to get the TSM Server in a working state again.

This is really puzzeling me now so I wonder if anyone has experienced this before or if anyone would have any ideas of what might be causing this.

My TSM Server Version :5.2.8.0

Thanks
Damien
 
PREDATAR Control23

if you have a TCPI/Ip faillure on your node and the server is still available
it's look like that you got a port conflict
you iniily configured your TSM server with specfic TCPPORT to listen, now if there is an another application using the same port that could provoq the problem
 
PREDATAR Control23

the dsmserv process could also be locking up. Since restarting dsmserv fixes it, there can't be something else on the port, and in fact, the OS won't let something else come up listening on a port that's already in use (I speak only for competent operating systems. I have no idea what might happen with windows).
 
PREDATAR Control23

Check what you have for max sessions. If you have it set to 50 and the 51st client tries to make a connection you could get an error like this. Can you make a connection via the admin command-line client (dsmadmc)?

-Aaron
 
PREDATAR Control23

The MAXSessions option is set to 150. Is there any way of running a trace or scan on the TSM Server service to find out what is going on. Also, is there a way of telling what if any other application is using the same port as the TSM Server service?
 
PREDATAR Control23

hi,
you may just do a "netstat -np tcp" on the server and see what ports are listening.

However i believe that it is not a matter of port. First of all, are your clients pointing to the IP address of the server or a dns name ? My advice is to always use IP address, you do not want to rely on dns. Then, after realising you do not have problem on the client side (tsm configuration), you should investigate on network problems, by checking tsm logs (client and server), and double checking with switch and/or network analyzer logs and matching the timestamps.
Keep in mind anyway that it might be a trivial issue, in which case you do need analyzers.

cheers
max
 
PREDATAR Control23

Same TCP/IP Com Errors

Hi All,

I am having the same issue with a new server 5.4.2. Server is (was) not backing up any data as of yet. I was adding a 200G volume tot he disk pool, and it errored out saying my recovery log had insufficient size.

Same errors as posted below:
ANS1017E Session rejected: TCP/IP connection failure
ANS8023E Unable to establish session with server.
ANS8002I Highest return code was -50.

Now I cannot start a session to the server, I cannot start the service (Windows Installation of TSM). Is there a way to extend the log without having TSM running?

Thanks

James
 
PREDATAR Control23

Define a new(can be small) recovery log volume (dsmfmt) and then start the server telling it to format the new recovery log volume. It will format the new log volume and then start the TSM server. It should then commit all the transactions that are still in the recovery log and clear it, allowing you to start the TSM server process normally.

-Aaron
 
PREDATAR Control23

MaxSchedSessions

The MAXSessions option is set to 150. Is there any way of running a trace or scan on the TSM Server service to find out what is going on. Also, is there a way of telling what if any other application is using the same port as the TSM Server service?

What is the maxschedsessions? How many are you backing up at any given time? These are things to answer to find out why there are a lot of nodes waiting.
 
PREDATAR Control23

I even i am facing the same problem, every 5th day i am facing this:


We have TSM Server 5.3.2.2.

ANS1017E Session rejected: TCP/IP connection failure ANS8023E Unable to establish session with server.
ANS8002I Highest return code was -50.



 
PREDATAR Control23

I speak only for competent operating systems. I have no idea what might happen with windows.

Windows is a competent operating system and as you'd expect, exhibits the same behaviour.
The TCP/IP connection failure messages are inifinitely more likely to mean that the server is not available at all than it is that there is a networking problem. Particularly likely is a full log volume.

Think of this as the opposite of going to a website without an Internet connection - you'll get a message saying "This server is unavailable or not responding" - that's not true; you just can't see it.

Check the server's log files, I am certain you'll find this is not a networking problem.
 
PREDATAR Control23

I had a similar problem on my TSM-system a while back, and here is what I found out:

System:
Shared library
2 backupservers
1 library manager

All of them windows 2003.

The problem only occured on one of my servers.
After some searching, I found this in my server activitylog:
ANR8210W TCP/IP driver is terminating due to error in accepting a new session, reason code 10055.

After this error, the server refuses all connection on the default TCPport.
There is no further errors logged in the server log, and the client logs the TCP/IP-failure-message.

It turns out the server-OS is running out of bufferspace for TCP-sockets.
The server has 4GB RAM, and I was using the /3GB switch in boot.ini.
This setting only leaves 1GB of memory for the OS.
When i was running backup of 200+ nodes, it ran out of buffermemory, and sent the 10055 message to Tivoli. Tivoli then shuts down incoming connections, and there is no way to make Tivoli accept incomming connections again. The only solution is to restart the TSM server-service.

The solution was to remove the /3GB switch. Which gives the OS 2GB, and the rest for applications.
I find it strange, howewer, that Tivoli doesnt recover after the 10055 message is sent from the OS. Best practice would be to shut down connections for 30 minutes, and then try again. Some of the nodes would probably have finished during that time, freeing up some buffermemory for new connections to be established.

For reference:
The /3GB switch has no impact on TSM. An application has to be specifically written to use the /3GB option. TSM is not written this way, and will never use more than 2GB. So there is no point in using the /3GB switch on a TSM-server.
 
Top