Re: Central scheduling failures

Thomas,

Yes I have seen this also. We are using the latest version of the ADSM
Server MVS v3.12.15
Exactly the same symptoms. No error messages!!. The scheduler appears to be
hung in memory, only by recycling the adsm server does the problem go away.

I opened a problem with IBM but have not had any response yet. It appears to
be workload related. I  spread out the load among other adsm servers and
have altered my client schedules so that I don't have too many client
schedules taking off within a short period. This has helped, but there's
defnitely a problem here.

Nathan


        -----Original Message-----
        From:   Thomas Denier [SMTP:Thomas.Denier AT MAIL.TJU DOT EDU]
        Sent:   Friday, March 26, 1999 9:35 AM
        To:     ADSM-L AT VM.MARIST DOT EDU
        Subject:        Central scheduling failures

        In the last few weeks my site has had two instances of the central
scheduling
        mechanism failing without evident cause. We have an MVS server at
3.1.2.1.
        Both of the clients involved were at 3.1.0.6. One was an AIX system
and the
        other was an HP-UX 10.20 system. Both use TCP/IP communications.
Both have
        'schedmode prompted' in the dsm.sys file. A 'query status' command
reports
        that the server supports any scheduling mode. In each case the
server log
        showed a message reported that a client event had missed its
start-up window.
        When I checked the client the 'dsmc sched' process was still running
in each
        case. When I checked the dsmsched.log file I found the following at
the end of
        the file in each case:

        Messages reporting execution of the last successful event
        Messages showing the results of querying the server for the next
scheduled
        event
        A message reporting that the scheduler process was waiting to be
contacted by
        the server

        All of the messages mentioned above had time stamps within a few
seconds of
        each other. In each case I stopped and restarted the scheduler
process and
        subsequent events were carried out on schedule. In the HP-UX case, I
checked
        the dotted decimal addresses used for client sessions before and
after the
        failed event. They were the same. In the HP-UX case, I updated the
schedule,
        creating a sequence of events like the following:

        Successful event
        Query for next event
        Schedule change
        Event created by schedule change (which failed)
        Event reported in response to the query

        I don't remember whether the AIX case involved a similar schedule
change.

        Does anyone recognize this as a known problem? Failing that, does
anyone have
        any suggestions for tracking down the cause?