On Jan 11, 2005, at 12:22 PM, Mark Trancygier wrote:
We have had two occurrences in the last two weeks of 1,500+ dsmscoutd
processes initiating and effectively crashing our system. The parent
dmcscoutd process, seems to start 2 new son (dmcscoutd) processes every
10 seconds, and continues to do this until the system hangs.
Has anyone experienced this issue ? If so, were you able to determine
the reason this many scout processes were started ?
Client OS = AIX 5.1
TSM Client Code = 5.1
TSM Server Code = 5.1
Hi, Mark - Good to run into another HSM user...
I just started using HSM 5.2, and haven't (yet) run into the situation
you describe. One thing I would try is defining a limited MAXCANDPROCS
value, and see if that exerts any remedial control. Also see if any
indications of problems from HSM Stdout/Stderr - which it is fond of
writing to the system console rather than a useful place...so do
/usr/bin/alog -f /var/adm/ras/conslog -o. Invoke 'dsmmigrate -R'
manually and watch its execution, as it may reveal a trouble spot in
the file system.
You seem to be at base code level, so consider a boost there to benefit
from fixes in general.
An advisory on one HSM problem I ran into: Migration may mysteriously
stop when you need it to happen, with no HSM processes active, but file
system full. I discovered the cause when I did a desperation
'dsmmigrate -R': there was a bogus symbolic link in the file system -
one whose target had gone away. Migration simply dies when this is
encountered. This is a gross defect which seems to not have been tested
for by development.
Richard Sims http://people.bu.edu/rbs
|