Problems causing client to stop functioning

davalex

ADSM.ORG Member
Joined
Nov 16, 2007
Messages
84
Reaction score
0
Points
0
Hi,

I'm having some issues on a few of our clients. I'm getting errors in the error log, and the client stops to function. The client is still running, but refuses to accept new connections from the tsm server. Schedules are missed, and this is a big problem...

I get these messages in the error log:
07/28/2009 08:21:24 sessInit: Transitioning: sInit state ===> sTRANSERR state
07/28/2009 08:26:24 sessInit: Starting communications initialization
07/28/2009 08:26:24 sessInit: Transitioning: sInit state ===> sTRANSERR state
07/28/2009 08:31:24 sessInit: Starting communications initialization
07/28/2009 08:31:24 sessInit: Transitioning: sInit state ===> sTRANSERR state
07/28/2009 08:36:24 sessInit: Starting communications initialization
07/28/2009 08:36:24 sessInit: Transitioning: sInit state ===> sTRANSERR state
07/28/2009 08:41:25 sessInit: Starting communications initialization
07/28/2009 08:41:25 sessInit: Transitioning: sInit state ===> sTRANSERR state
07/28/2009 08:46:24 sessInit: Starting communications initialization
07/28/2009 08:46:24 sessInit: Transitioning: sInit state ===> sTRANSERR state
07/28/2009 08:51:24 sessInit: Starting communications initialization
07/28/2009 08:51:24 sessInit: Transitioning: sInit state ===> sTRANSERR state
07/28/2009 08:56:25 sessInit: Starting communications initialization
07/28/2009 08:56:25 sessInit: Transitioning: sInit state ===> sTRANSERR state
07/28/2009 09:01:24 sessInit: Starting communications initialization
07/28/2009 09:01:24 sessInit: Transitioning: sInit state ===> sTRANSERR state

Has anyone seen this problem before?

This started of on two nodes with heavy network traffic, but I now also get this errors on nodes with low traffic. Really can't understand why this is happening.

I'm grateful for all replies! :)
 
What is your max sessions setting at? There maybe too many nodes trying to connect to the TSM Server with the max session set too low.
 
This might have been the problem. Max sessions where set to 150, and we are running over 200 each night. I set max sessions to 400 and max sched sessions to 95%.

Thanks for the tip, I'll see if this is working tomorrow morning! :)
 
Hi,

I change the number of max sessions and the number of max scheduled sessions, but this does not seem to help. We are still having the same problem, and we are getting it on more and more servers..

Any other ideas? :)
 
Can you do a FTP of a 100MB+ file from the client to the server, and then vice versa? This is to check your network connection in both directions.

Can you post your client dsm.opt/dsm.sys, and a "q stat" and "q opt" from the server?

Anything going into the event log in windows?

Is there a firewall in between the client and server?
 
Hi BBB,

I have no problems ftp files between the tsm-server and the clients. This works just fine :)

We are not having this problems with our windows servers.

dsm.sys
SErvername tsm1
COMMMethod TCPip
TCPPort 1500
TCPServeraddress 89.XXX.XXX.XXX
passwordaccess generate
schedmode prompted
querysched 1
SCHEDLOGNAME /var/log/tsm/sched.log
ERRORLOGNAME /var/log/tsm/error.log
schedlogretention 7 D
ERRORLOGRET 7 D
TCPCLIENTADDRESS 194.XXX.XXX.XXX
NODENAME NODE.openpower1
EXCLUDE /var/log/.../*

dsm.opt
SE tsm1
DOMAIN "/"

q stat
Server Name: TSM1
Server host name or IP address:
Server TCP/IP port number: XXXX
Crossdefine: Off
Server Password Set: No
Server Installation Date/Time: 01/11/2008 05:10:53
Server Restart Date/Time: 07/26/2009 15:02:36
Authentication: On
Password Expiration Period: 90 Day(s)
Invalid Sign-on Attempt Limit: 0
Minimum Password Length: 0
Registration: Closed
Subfile Backup: No
Availability: Enabled
Accounting: Off
Activity Log Retention: 15 Day(s)
Activity Log Number of Records: 521370
Activity Log Size: 79 M
Activity Summary Retention Period: 30 Day(s)
License Audit Period: 1 Day(s)
Last License Audit: 08/09/2009 15:17:53
Server License Compliance: Valid
Central Scheduler: Active
Maximum Sessions: 400
Maximum Scheduled Sessions: 380
Event Record Retention Period: 10 Day(s)
Client Action Duration: 5 Day(s)
Schedule Randomization Percentage: 25
Query Schedule Period: Client
Maximum Command Retries: Client
Retry Period: Client
Scheduling Modes: Any
Log Mode: RollForward
Database Backup Trigger: Enabled
BufPoolSize: 2,097,152 K
Active Receivers: CONSOLE ACTLOG
Configuration manager?: Off
Refresh interval: 60
Last refresh date/time:
Context Messaging: On
Table of Contents (TOC) Load Retention: 120 Minute(s)
Machine Globally Unique ID: 66.54.54.6f.0c.bf.dc.11.b3.3b.00.1a.64.68.9d.cd
Archive Retention Protection: Off
Encryption Strength: AES

q opt
Server Option Option Setting Server Option Option Setting
----------------- -------------------- ----------------- --------------------
CommTimeOut 3,600 IdleTimeOut 15
BufPoolSize 2097152 LogPoolSize 2048
DateFormat 1 (mm/dd/yyyy) TimeFormat 1 (hh:mm:ss)
NumberFormat 1 (1,000.00) MessageFormat 1
Language AMENG Alias Halt HALT
MaxSessions 400 ExpInterval 24
ExpQuiet No EventServer Yes
ReportRetrieve No DISPLAYLFINFO No
MirrorRead DB Normal MirrorRead LOG Normal
MirrorWrite DB Sequential MirrorWrite LOG Parallel
VolumeHistory volcnfg.out Devconfig devcnfg.out
TxnGroupMax 256 MoveBatchSize 1000
MoveSizeThresh 2048 RestoreInterval 1,440
DisableScheds No NOBUFPREfetch No
AuditStorage Yes REQSYSauthoutfile Yes
SELFTUNEBUFpools- No DBPAGEShadow No
ize
DBPAGESHADOWFile dbpgshdw.bdt MsgStackTrace On
QueryAuth Analyst LogWarnFullPerCe- 90
nt
ThroughPutDataTh- 0 ThroughPutTimeTh- 0
reshold reshold
NOPREEMPT ( No ) Resource Timeout 60
TEC UTF8 Events No AdminOnClientPort No
NORETRIEVEDATE No IMPORTMERGEUsed Yes
DNSLOOKUP Yes NDMPControlPort 10,000
NDMPPortRange 0,0 SHREDding Automatic
SanRefreshTime 0
TCPPort 1500 TcpAdminport XXXX
HTTPPort 1580 TCPWindowsize 64512
TCPBufsize 16384 TCPNoDelay Yes
CommMethod TCPIP MsgInterval 1
ShmPort 1510 FileExit
FileTextExit UserExit
AcsAccessId AcsTimeoutX 1
AcsLockDrive No AcsQuickInit No
SNMPSubagentPort 1521 SNMPSubagentHost 127.0.0.1
SNMPHeartBeatInt 5 TECHost
TECPort 0 UNIQUETECevents No
UNIQUETDPTECeven- No SHAREDLIBIDLE No
ts
3494Shared No CheckTrailerOnFr- On
more... (<ENTER> to continue, 'C' to cancel)

ee
SANdiscovery Off

about firewall: Yes, there are firewalls between these servers. But all of the needed ports are open. We have the exact same firewall rules on other clients without this issue.

Thanks,

Alexander Davidsen
 
Last edited:
Nothing really wrong there. You have a very big buffer pool though - how much physical ram is in this tsm server? And is it a windows tsm server or unix?

If I was going to guess I'd say it was firewall or network related. Do a backup manually with the client and turn on client tracing, you should get some hints from that as to where it is freezing up and what it was doing at the time.

Or use truss or your choice of generic tracing tool to watch what the dsmc process does. Make sure you trace child processes if doing it with a generic tool.

ps. You didn't say what goes in your dsmsched.log. What's in there, you can often tell whats going wrong from where its getting to in there.
 
The TSM server is running on linux and it has 16GB of memory.

When running the backup manually, we don't have these problems. It's only when running schedules. Anyways I will try do a trace to see what's going on.

What traceflags should I set in dsm.sys for the client?

I'm also investigating any possible network issues! :)
 
Hi team,

today I faced the same issues after performing manual backup its running fine from client side.
why do we face this issues & since i saw the rc = -107856321 so i tried to run the manual backup ,plz expalin what happens if rc is in - ?
 
Back
Top