ADSM-L

TSM on AIX 4.3.3 goes to "sleep"

2002-03-15 19:54:59
Subject: TSM on AIX 4.3.3 goes to "sleep"
From: John Nawotka <john AT COMPUTERGUY.COM DOT AU>
Date: Sat, 16 Mar 2002 11:33:15 +1100
Dear TSM Gurus,

We are running TSM Version 4.1 on AIX 4.3.3, both Server and client.

We have one node that for some reason has started taking a lot longer to
complete its backup than any of the other similar nodes.

When examining various log files we have found that the process seems to stop
for 7 - 8 hours and then all of a sudden kicks off again.

Here is an exerpt from various logs ...

dsmsched.log (client) ...

03/11/02   23:40:20 Normal File-->            11,360    
/phkg/cre/drscom33_9503/data/XHKGDAT33/FILE/COMINHO/MHK3410843.11-MAR-2002.12:52:20
 [Sent]
03/12/02   07:29:27 ANS1898I ***** Processed 1,180,000 files *****
03/12/02   07:29:28 ANS1809E Session is lost; initializing session reopen 
procedure.
03/12/02   07:29:43 ... successful



dsmerror.log  (client)

03/11/02   23:40:20 Normal File-->            20,960    
/phkg/cre/drscom33_9503/data/XHKGDAT33/FILE/COMINHO/MHK3410832.11-MAR-2002.11:47:06
 [Sent]

03/12/02   07:29:27 TcpRead(): recv(): errno = 73
03/12/02   07:29:27 sessRecvVerb: Error -50 from call to 'readRtn'.
03/12/02   07:29:27 ANS1809E Session is lost; initializing session reopen 
procedure.
03/12/02   07:29:28 ANS1809E Session is lost; initializing session reopen 
procedure.
03/12/02   07:29:43 ANS1810E TSM session has been reestablished.


Server Activity Log ...

Shows two sessions started for that client within 4 seconds of each other ...
and both sessions terminated at the same time at 04:40 with a message
Terminated - idle for more than 300 minutes.

It then shows a new session (only one) started that corresponds with the start
time of the restarted session (i.e. @ 7:29)


When the process finally restarts (in this instance at 07:29) it completes
successfully.  The problem, of course, is that it is running into our normal
production day, so I am not confident of the integrity of the backup.

What I don't understand is ...

What is going on here?
Why does the Actlog show the sessions terminating at 04:40 and the client 
restarting at 07:29.
Why are there two sessions starting up when there is only one instance of the 
dsmc schedule process running on the node and that client is only associated 
with one schedule?
Why has the session(s) become idle?

I have been to the adsm.org forum, and the only suggestion that is given is to 
look at the Server's Activity Log to see the reason for the failure.  I have
done that and what I found is shown above.

Can anyone help me?

Any assistance is greatly appreciated.

Best Regards

John Nawotka
john AT computerguy.com DOT au
<Prev in Thread] Current Thread [Next in Thread>