ADSM-L

Re: Server Daemon hangs and nodes not res

1997-07-15 09:45:23
Subject: Re: Server Daemon hangs and nodes not res
From: Leonard Boyle <SNOLEN AT VM.SAS DOT COM>
Date: Tue, 15 Jul 1997 09:45:23 EDT
On Tue, 15 Jul 1997 06:58:32 -0500 dan thompson said:
>Brian,
>
>  I am afraid that I have very little help to offer you.  Due to the fact
>that ADSM is a network based storage manager one of the most prevalent
>problems is network problems.  I have not had a rash of problems that
>cannot be attributed back to a network problem.  These problems on our end
>are not usually failures as much as they are network maintenance.

We have noticed a rash of problems that are network and some that are
server based. In the past week we noticed this problem.


>
>However, when maintenance is done we are currently experiencing a very
>large number of hung sessions.  These sessions often reach our Commtimeout
>limit on the host end and the sessions appear to be cancelled.  The clients
>seem to act as if waiting on a response on that session however.  It
>appears as if neither the server or the client are aware of the session
>being lost and react appropriately.
>

In our case our only commethod for the ba clients is tcp/ip.
The one case that really pointed the problem to us, was a novell client
in a remote office, in which a router had a card replaced. The server
timed out the session, and the client was updating the dsmsched.log with
a send line. No errors noted in dsmerror.log.

The server does notice. When the commtimeout period is reached the server
terminates the session.
But the ba client does not have a comm timeout setting. So the program
sits there waiting for a packet from the server until such time as the
client is stopped and started again.

I can think of at least three solutions to the problem.

   1) Have a client timeout as the server does now. There may be problems
      with this terminating good sessions. This might require some thought
      to coordinate with the server timeout period.
      In theory this would allow the client to try to recontact the
      server or the server to contact the ba client to restart the
      scheduled event.

   2) If  the client is listing for a packet from a prior session and the
      server tries to contact the client to start a new scheduled session,
      either the client or the server should notice this and take action
      on it. The best case would be for the client to terminate the old
      backup session and start the new schedule session. The server should
      note this in it's log.
      In the worst case the server should report, without a trace required,
      a reason code for why it could not contact the ba client for a
      scheduled event.

   3) IBM should provide an adsm ping function, or give us the technical
      doc to do so. The best case would allow us to issue a command that
      would go out to the ba client and ask if it was ready to receive
      an adsm schedule command. Like the tcp/ip ping this would test that
      the machine and network are working. But this would also test that
      the ba client schedule function was working. At this time all we
      can tell is that the program was started and is running, not that it
      is working correctly.

>The only "solution" that I have at the current time is to write a program
>that will run on the client and bounce the service periodically.  This will
>obviously have to do a query session first to make certain that a session
>is not already active.  I will keep you informed of any progress.
>
>Dan T.
>----------
>> From: Brian Jones <brian.jones AT NY.UBS DOT COM>
>> To: ADSM-L AT VM.MARIST DOT EDU
>> Subject: Re: Server Daemon hangs and nodes not res
>> Date: Monday, July 14, 1997 7:55 AM
>>
<Prev in Thread] Current Thread [Next in Thread>