ADSM-L

Re: FYI: Another shop with the archive hang problem... RE: hang ar chive (dsmc) on Sun Solaris

2001-10-23 13:48:18
Subject: Re: FYI: Another shop with the archive hang problem... RE: hang ar chive (dsmc) on Sun Solaris
From: Michael Oski <moski AT APPLE DOT COM>
Date: Tue, 23 Oct 2001 10:45:35 -0700
We too have this issue occurring on our Solaris 2.6 systems with the
4.1.2 TSM client. I was looking at the 4.2.1 client as a possible
solution since the fixed APAR list in the README lists a "fixed dsmc
lockup after successfully sending data" entry. I cannot for the life of
me find the APAR number anywhere on Tivoli nor IBM's websites. When I
installed the 4.2.1 client on a Solaris 2.6 E450, I found I could not
restore any file data from previous backups. The TSM server is running
4.1.3.0 and the restore would build out the directory structure, then
fail with an errno=9 when trying to restore the first file within it. I
backed out the 4.2.1 client and reinstalled 4.1.2 and restored the data
needed. The 4.2.1 client on my Solaris 8 workstation had no problems
restoring the same data from the same node - so it appeared to be
related to Solaris 2.6.

At this point, there is a 4.1.3.0 client as the latest maintenance
release. I'm verifying that with both Solaris 2.6 and 8 before rolling
it out to production. I was kind of hoping the dsmc archive lockup would
be fixed in it, but perhaps that's just wishful thinking.

FWIW, our environment does lots and lots of archives. Our Oracle
databases are archived one filespace at a time in Hot Backup mode with a
ksh script run daily. Then crontab entries archive and delete Oracle
Redo logs anywhere from every 5 minutes to every 20 minutes, depending
on specific database. Then the flat-file filesystems are incrementally
backed up each night. It's the Hot Backup scripts that dsmc is locking
up in. I'll get 8 out of 14 or so filespaces archived and then it just
dies. The session ends and disconnects on the server-side without any
unusual issues. The client simply leaves that process sitting there
until it's manually killed. Then the script continues on with the next
filespace - except by that time the Hot Backup window has ended and the
filespace is in an open state within Oracle. So, our backups on the day
of the lockup are worthless since they're incomplete. If not killed and
cleared out promptly, the locked up dsmc process seems to interfere with
subsequent script execution a day or two later.


On Tuesday, October 23, 2001, at 07:49 AM, Hunley, Ike wrote:

We have that very same issue running TSM 4.1.2.14 on AIX 4.3 with TSM
4.1.4
running on OS/390 V2R9.  Traces yield different results the few times
the
hang occurred while trace was activated.  Most times it doesn't hang at
all
with trace active.  Archive sessions do run SLOWWWW!

We are currently working with IBM.  They are now going to bring a
Performance person to the conference call.  Archives running in macro
files
worked fine in ADSM 3.1.06.  After moving to TSM 4.1.2.14 the hangs
occurred.  IBM suggested that when we install TSM 4.2.0, then re-write
the
process to take advantage of a filelist feature, bundling numerous(the
more,
the better) file archives in one archive session.  This failed in TSM
4.12.0, so IBM then suggested that we install TSM 4.2.1.  The failures
don't
occur as often, but the hangs still do from time to time.

This has been a "Moving Target", so problem determination is almost
impossible.

How many archives are being done per day at your site?  IBM tells us
that we
were doing too many for TSM 4.x archive to handle.

-----Original Message-----
From: Gerd Bentel [mailto:Gerd.Bentel AT SI-BW DOT DE]
Sent: Tuesday, October 23, 2001 11:00 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: hang archive (dsmc) on Sun Solaris


Hi

In the moment we have a problem, that sometimes the dsmc command hang
and
we dont't know why.
We make an dsmc archive command (Sun Machine E10000 under Solaris 2.7
and
TSM Client 4.1.2.14) and sometimes the dsmc command hang. The TSM
Server is
on OS/390 TSM 3.7.4.
To get more information we try an Client trace but this was not very
helpful. Because the last entries in the tracefile where always
different.
Have anybody the same problem and perhaps the solution?

Thank you

Gerd Bentel

Sparkassen Informatik GmbH & Co. KG
Standort Fellbach
Datenhaltung-Middleware
Wilhelm-Pfitzer-Str. 1
70736 Fellbach
Telefon:   (0711) 5722-2142
Telefax:   (0711) 5722-1630
Mailadr.:  Gerd.Bentel AT sparkassen-infromatik DOT de



Blue Cross Blue Shield of Florida, Inc., and its subsidiary and
affiliate companies are not responsible for errors or omissions in this
e-mail message. Any personal comments made in this e-mail do not
reflect the views of Blue Cross Blue Shield of Florida, Inc.

<Prev in Thread] Current Thread [Next in Thread>