ADSM-L

Re: [ADSM-L] wait=yes timeout??

2007-09-20 15:16:52
Subject: Re: [ADSM-L] wait=yes timeout??
From: Richard Sims <rbs AT BU DOT EDU>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Thu, 20 Sep 2007 15:14:17 -0400
David -

It's strange that the TSM server Activity Log shows no ANR message of
any kind relating to the dsmadmc session loss: I would expect at
least some record of the session drop from its end.  Based upon what
you report, it would appear that the TSM server was not responsible
for the dsmadmc session dropping, which is to say that there is no
TSM timeout involved.  I would thus look for other causes.  Check for
any incidental entry in the dsmerror.log.  One place to look is in
the AIX accounting records, searching for the dsmadmc process name,
and to particularly look for ac_flag having the  AXSIG (Killed by
signal) bit being set, which would indicate that the session met a
fate involving an OS event.  If so, I would further look in the AIX
Error Log for any record of the process demise, which would reveal
cause.  Miscellaneous things can cause mysterious terminations, the
Tcsh autologout (http://www.erdc.hpc.mil/documentation/Tips_Tricks/
autologout) being a gross example.  If you're going through a
firewall facility of some kind, there may be some port use
termination therein, based upon excessive duration.

I run dsmadmc 24 x 5, and never see any session loss, per se.  There
is the standard ANR0482W "session termination" based upon the server
IDLETimeout value, but that's "under the covers" and does not result
in dsmadmc process loss: upon next keyboard action, the interaction
resumes (ANR0402I) within the same ongoing dsmadmc process, which is
to say no TSM login required.

One thing for sure is that your script is way too simple, lacking any
error handling, beginning with return code/status evaluation between
command invocations.  I would recommend using Perl, where you can
readily program error detection, handling, logging, and recovery.

  what I can think of,  Richard Sims

On Sep 20, 2007, at 1:23 PM, Taylor, David wrote:

I see the "ANR2017I Administrator ADMIN issued command: BACKUP
STGPOOL..." in the actlog at the time that the command was first
issues.
The ba stgpool started and was running fine.  12 hours (exactly)
later,
the script reported the "ANS1017E session rejected" and went on to the
next command, which was "ba db...", however the ba stgpool was still
running.  There was nothing recorded in the actlog when the script
reported the ANS1017E other than the commencement of the ba db
command.

It appeared that command line 'dsmadmc -id=admin -pass=$TSMPWD "ba stg
collgoldprimarypool collgoldcopypool maxpr=3 wait=yes"' timed-out
waiting on a return from the command.  It,or all practical purposes,
orphaned the ba stg and continued processing the rest of the script.

I can find no time-related values in the server's configuration
that is
anywhere near 12 hours (43,200 seconds).

I guess that at this point, I am comfortable in understanding what
happened, and am now interested in finding out if that timeout can be
adjusted.

<Prev in Thread] Current Thread [Next in Thread>