Groups of missed backups

c.j.hund

ADSM.ORG Senior Member
Joined
Jun 22, 2005
Messages
247
Reaction score
4
Points
0
Website
Visit site
Hi all,

Information about this environment:
TSM Servers - AIX v7.1, TSM v7.1.7.100
TSM Clients - Linux RH - 2.6.32-696.16.1.el6, TSM 7.1.4.1

Every so often, seemingly at random intervals, large groups of my Linux clients will miss their scheduled backup. The backups might run great for two weeks straight, then we'll have a day with 60 misses. This only happens with the Linux clients. I have Windows clients in this environment as well, and they do not seem to be affected. It's not always the same group of clients, but in order to get them operating again we are forced to restart the scheduler service. Sometimes, we'll restart the scheduler service on 60 Linux clients, then the next day a different group of 60 will miss, and the clients which we restarted scheduler services on the day before run just fine.

Some of the more important points:
  • All these Linux clients are using the scheduler service, not the CAD.
  • When the misses occur, it always seems to happen for a group of clients in the same schedule, at the same time. Often the Linux clients themselves are in the same subnet with similar IPs.
  • There's not much information in the client dsmerror.log file - all we see are messages like these:
    12/21/17 09:17:33 ANS5216E Could not establish a TCP/IP connection with address 'X.X.X.X:X'. The TCP/IP error is 'Connection timed out' (errno = 110).
    12/21/17 09:17:33 ANS9020E A session could not be established with a TSM server or client agent. The TSM return code is -50.
    12/21/17 09:17:33 ANS2106I Connection to primary TSM server XXX failed
  • We are not running out of sessions on the TSM server.
  • There doesn't appear to be anything in the TSM server's error log which would indicate a problem.
Is this why it's always recommend to use the CAD? This problem comes up randomly, so it's hard to nail down. Something is going on, however, which is preventing a TCP/IP connection. It feels like a network issue.

Any insights on what might be causing this would be welcomed.

Thank you,
C.J.
 
Here's the exact message recorded in my Linux client's error log file:


01/07/18 01:34:13 ANS9020E A session could not be established with a TSM server or client agent. The TSM return code is -53.
01/07/18 01:34:13 ANS2106I Connection to primary TSM server XXX failed
 
Client Side: on dsm.sys
1. Check whether you have mentioned TCPSERVERADDRESS as Host or IP (Better us IP)
2. Use TCPPORT and TCPCLIENTPORT both parameters with different ports.
For ex :
TCPPort 1500
TCPCLIENTPort 1501
TCPServeraddress 10.10.10.10
tcpclientaddress 10.10.10.101

Check Schedmode as well
Check for port restrictions if any before using specific ports.

On TSM Server side:
1. Schedule Randomization Percentage > check this
2. Maximum Sessions allowed & Maximum Scheduled Sessions > check this
3. check each schedule priority and update them based on requirement.
 
Back
Top