Scheduled jobs on Solaris 5.1.5 client and Linux server (5.1.5-3)

meskola

ADSM.ORG Member
Joined
Oct 4, 2002
Messages
6
Reaction score
0
Points
0
I have installed TSM server 5.1.5-3 on RedHat Linux. Now that I am trying set up the scheduled tasks I run into a problem. My clients (Sun Solaris) Abort when a scheduled job starts.

I can do manula backups just fin but the scuduled ones faul with the following error message in dsmerror.log:

2002-12-04 14:47:47 main thread, fatal error, signal 11



The server logs only say:

12/04/2002 03:13:51 ANR0406I Session 1 started for node SUN1 (SUN SOLARIS)

PM (Tcp/Ip XX.XX.XX.XX(37201)).

12/04/2002 03:13:51 ANR0480W Session 1 for node SUN1 (SUN SOLARIS)

PM terminated - connection with client severed.



(IP adress removed manually ;)



I'm going nuts here. I have upgraded the client, the server, I've downgraded the Linux kernel to be the exctly right one according to the docs.



I will try to intstall a windows client tonigt to see if it is a server problem or a client problem.



But anyways, all help is apprecieted.



//markus
 
Markus,



Since the manual backup works, we can eliminate the network. I would suspect that if we did an ftp or a telnet that this would work.



There is a parameter in the dsm.sys file called "schedmode."

The default setting for this paramter is "polling."



Change this paramter from "polling" to "prompted" or if we are "prompted" change this

to "polling."



I have seen strange schedule issue solved just by changing the schedmode.



When you make any changes in the dsm.sys file, stop and restart the schedule daemon.



Also check the schedmode on the server.

The schedmode on the server need to be set to "any."

Issue: q stat

If the schedmode on the server is set to other than "any" to change it the syntax is:

set schedmode any.



Sias
 
Hi!



I have also come to the conclusion that the network isn't the problem (there is no fancy pancy equipment, nothing but a simple switch). And all other protocols work, ftp, http, smtp etc



I started with polling on the clients, i just didn't do the job, the client exited with a main thread error during the scheduled backup.

I then changed to prompted and the cclient crashed when i tried to start it. So I changed back to polling again. To be able to test better I changed the randomization percentage to 0, and now the client crashes directly when there is a job scheduled.

I then tried to use the client acceptor deamon but without any luck, same problem as before.



I still haven't had the time to test a windows client but I'll try during next week.



Luckily these are machines in my lab and not in production :)



Ps. I have tried 2 different revisions on the client side aswell, but problem presists. I have contacted Tivoli about it and hopefully they'll come up with an answer.



/markus
 
Markus,



That's interesting that the client crashed with the schedmode set to prompted. :-o



Sounds like Tivoli need to take a look at this and run some traces to see why the client

crash.



Good luck,

Sias
 
The most intresting thing is that it crashes both in promtpted and polling mode.



Do you have any clue how to do a trace? Is it the TRACEFILE option in dsm.opt that needs to be set? :confused:



Anyway, it will be intresting to see what Tivoli has to say about this...



A small update; updated the server to revision 5.1.5-4 but problem still presists...



/markus
 
Markus,



I don't suspect that it is a TSM server issue. If it was then all the clients would be crashing.

Was the TSM server upgraded to 5.1.5.3? I am not sure I thought that there was a big issue with 5.1.5.3 and this code was pulled from the ftp site. You may want to check the ftp site to see if there is a 5.1.5.4 . If there is check the readme file to see what issue is address.



Its been a long time since I have ran a client trace, this is all I remember.

TRACEMAX=# Size of the trace file.

TRACEFILE=filespace Name of the output file.

TRACEFlags=flag1, flag2..... Trace filtering, 63 flags available. I would not know what traceflags to set.

I seem to be getting old. I can't remember if it goes in the dsm.sys or the dsm.opt file.



I am not sure, I think that there is a TSM Trace Guide.



I am curious on what Tivoli finds out. Its very strange that the client crash when the schedmode is set to prompted or polling.



Here is a curiousity of mine. :confused:

Have you thought about setting up a cron job to issue the backup command?

Since we know that the manual backup works and since the client does not know if someone is typing in the command or a cronjob that is issuing the command.

I know that this would be a work around.



A brief second I thought about idletimeout or commtimeout values being a little low. If this was the case we should be seeing message about the client being timed out by the server or the server is timing out the client since there were no communcation for XX seconds. The client should not crash if the timeouts were the issue.



Good luck,

Sias
 
I am not sure anymore where the problem could be...



I installed a Win klient and it run perfectly so did a local TSM client on Linux. so now all that remains is the Solaris clients. :p Wierd thing is that these two clients work in scheduling but the solaris doesn't.



I have upgraded the server to 5.1.5-4 and it still doesn't work. Just as you said I also found that the 5.1.5-3 had been pulled of the ftp. But the readme didn't say too much.



I talked to Tivoli again just a few minutes ago and some sort of complaint/bugreport is being issued.





So now what I'll do is start playing with the tracefiles to see if I can get anything inrtesting out of that.



/markus
 
Markus,



If the client was on a Windows platform, I would suspect a permission issue.



Since the client is on Solaris, I would not suspect a permission issue.

If its a permission issue, I would not expect to see "main thread, fatal error, signal 11" messsage. The times that I have seen a signal 11 is when there is a shut down of the server or the client.



When you have a chance set up a cron job and see if this will work.

Since we know that the manual backup works.



Good luck,

Sias
 
Now I've been playing around with tracefiles on the solaris clients and it produced this just when it exited with signal 11:



01/08/03 14:35:58.274 : sun/psunxthr.cpp ( 188): main thread, fatal error,

signal 11





I haven't got a clue what it means but now I know where it crashes... ;)



/markus
 
Markus,



Not sure what this file does, psunxthr.cpp . :confused:



Have you heard back from Tivoli on what they may have found?



Sias
 
I don't have a clue either what this fantastic file (sun/psunxthr.cpp) does. Ánd I haven't heard a single sound from tivoli yet.



I'll report the results as soon as I get any :)
 
Back
Top