ADSM-L

Re: dsmscoutd problems on AIX

2005-01-11 14:01:16
Subject: Re: dsmscoutd problems on AIX
From: Richard Sims <rbs AT BU DOT EDU>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Tue, 11 Jan 2005 14:00:37 -0500
On Jan 11, 2005, at 12:22 PM, Mark Trancygier wrote:

We have had two occurrences in the last two weeks of 1,500+ dsmscoutd
processes initiating and effectively crashing our system. The parent
dmcscoutd process, seems to start 2 new son (dmcscoutd) processes every
10 seconds, and continues to do this until the system hangs.

Has anyone experienced this issue ? If so, were you able to determine
the reason this many scout processes were started ?


Client OS = AIX 5.1
TSM Client Code = 5.1
TSM Server Code = 5.1

Hi, Mark - Good to run into another HSM user...

I just started using HSM 5.2, and haven't (yet) run into the situation
you describe. One thing I would try is defining a limited MAXCANDPROCS
value, and see if that exerts any remedial control. Also see if any
indications of problems from HSM Stdout/Stderr - which it is fond of
writing to the system console rather than a useful place...so do
/usr/bin/alog -f /var/adm/ras/conslog -o. Invoke 'dsmmigrate -R'
manually and watch its execution, as it may reveal a trouble spot in
the file system.

You seem to be at base code level, so consider a boost there to benefit
from fixes in general.

An advisory on one HSM problem I ran into:  Migration may mysteriously
stop when you need it to happen, with no HSM processes active, but file
system full. I discovered the cause when I did a desperation
'dsmmigrate -R': there was a bogus symbolic link in the file system -
one whose target had gone away. Migration simply dies when this is
encountered. This is a gross defect which seems to not have been tested
for by development.

     Richard Sims   http://people.bu.edu/rbs

<Prev in Thread] Current Thread [Next in Thread>