ADSM-L

Re: [ADSM-L] TSM Scheduler Service in 'Stopping' state

2008-03-16 20:46:30
Subject: Re: [ADSM-L] TSM Scheduler Service in 'Stopping' state
From: Richard Sims <rbs AT BU DOT EDU>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Sun, 16 Mar 2008 20:45:27 -0400
On Mar 16, 2008, at 6:47 PM, Pahari, Dinesh P wrote:

The server was rebooted intentionally after a regular maintenance.
There
are no significant errors reported in the event viewer. Similarly, TSM
activity log has recorded nothing since the reboot and no errors are
reported for the error occurred.
FYI below
Event Viewer:
        Source: AdsmClientServiceTSM
        Message: Scheduler halter right after the reboot.
Dsmerror.log:
03/02/2008 04:03:14 ConsoleEventHandler(): Caught Shutdown console
event
.
03/02/2008 04:03:14 ConsoleEventHandler(): Cleaning up and terminating
Process ...
03/02/2008 05:09:46 ConsoleEventHandler(): Caught Shutdown console
event
.
03/02/2008 05:09:46 ConsoleEventHandler(): Cleaning up and terminating
Process ...

Dsmsched.log:
Process Interrupted!!  Severing connection. <<<<<<

Unfortunately, those are very vague and general messages, which seem
to be suggesting that something external to TSM is whacking the
process when it tries to start.  Without specific diagnostics
presenting themselves, you'll have to work toward uncovering the
cause.  I would start by taking a look at what's also running on that
system, and in particular if there's any kind of new facility on the
system which has the power to kill processes, home-grown or
commercial.  You might compare what's running on that system relative
to similar systems in your shop.  I would also check for reduced
"boundary conditions" which might keep things from running, such as
full file systems or loss of memory or swap space (though these
should show up in the OS Event Log).

I would then perform some escalating experiments to try to attract
the cause...  Use the CLI to run some increasingly consumptive
commands, starting with 'dsmc q sess", a 'dsmc i' of a directory, and
so on, to see when things start failing.  If you can't achieve a
session, then you may be having networking problems or a prohibitor
of some kind.  If you can get a dsmc to work, then try manually
starting the scheduler while watching the dsmwebcl.log.  If the
scheduler will start, then try a simple DEFine CLIENTAction from the
TSM server, to run a simple OS command, and see what happens, then
see if it can run a 'dsmc i' of the directory you tried before.   And
if this is happening to the TSM process, I would expect it to be
happening to some other system processes, too: look for other things
which can't run.

You have a clear demarcation as to when this all started, at the
reboot.  A reboot can put into effect a bunch of changes which had
been planted in the days and weeks prior to that event...things which
could have been forgotten over time.  It's also possible for a bad OS
patch to be put into effect which is causing problems.  Check your
site records of system changes.

    Richard Sims

<Prev in Thread] Current Thread [Next in Thread>