ADSM-L

TSM Server v4.2.1

2001-10-29 12:37:31
Subject: TSM Server v4.2.1
From: Andrew Raibeck <storman AT US.IBM DOT COM>
Date: Mon, 29 Oct 2001 10:34:14 -0700
Hello all,

We have put our full attention on addressing the early problems seen in
4.2.1, and either have a fix already available or expect to deliver one
shortly. Despite following our strict development process, focusing significant
resource on design reviews, testing, and running a successful beta with several 
large customers, we had some defect escapes
to the field that you may have encountered.

Our plan is to deliver a fixtest called 4.2.1.6 shortly with all of the
fixes currently available on various platforms rolled up into one level. We 
have corrected problems relating to mount point management, LTO
devices, 3494 libraries, and a server crash. Applicable APARs are IC31961, 
IC31823, IC31831,
IC31691, and IC31884. While some fixes are available in various prior
patch levels, the rollup fix will address all of these problems. If you
have problems in these areas, but your problems are not covered by the 
description in these APARs, please
contact Tivoli service.

We have the fixes running in our test environments and at some customer
accounts. We have confidence the fixes will correct these reported problems. 
Ultimately we believe the changes made to these
code paths that were introduced into 4.2.1 will be of great benefit and will 
improve your satisfaction with our product. We
apologize for letting these defects escape our process.

For your convenience, the text of the APARs listed above appears below my
signature information.

Regards,

Andy

Andy Raibeck
IBM Software Group
Tivoli Storage Manager Client Development
Internal Notes e-mail: Andrew Raibeck/Tucson/IBM@IBMUS
Internet e-mail: storman AT us.ibm DOT com

The only dumb question is the one that goes unasked.
The command line is your friend.
"Good enough" is the enemy of excellence.

------------
APAR IC31691
APAR IC31691
------------
ABSTRACT:
ABSTRACT:
LIBRARY AUDIT ON 349X MAY NOT CORRECT CATEGORY MISMATCHES

ERROR DESCRIPTION:
The LIBRARY AUDIT command for the
3494 and 3495 libraries may not be
able to correct category mismatches
of private volumes.

LOCAL FIX:
Recovery Procedure:
1)Stop activity to your library (i.e. mounting/dismounting
  of tapes) if any activity is going on.
2)Determine the name of the libr from aix perspective.
  (Usually default is lmcp0).  In the procedure that follows,
  replace lmcp0 with whatever name is appropriate for
  your system. (lsdev -Cc tape should list the tape
  devices and you can find the libr name there).
3)Determine whether you are using the default
  catagories or whether you have specified catagories.
  The default catagories in TSM are 300 and 301 and
  can be found via a Q libr command. If you are using
  the defaults, the numbers in hex that you will need
  below are for scratch 012E and for private is
  hex 012C. If you are using the defaults, go to the next
  step now. If you are NOT using the defaults,  you will
  have to convert the number shown on the Q libr
  command for scratch and private from decimal to hex.
  When you convert the number for scratch, you must
  add 1 to it before you convert.  (This is because Q libr
  shows catagory for 3490 and catagory for 3590 is one
  higher).
4)In TSM, do an sql statement to find all volumes in
  private and redirect that to a file:
   select volume_name from libvolumes
          where status='Private'  > filename
5)At the AIX prompt, edit the file created in the above
  step to remove the header lines. There are usually at
  least 2 lines before the first volume is listed. The file
  should only contain volume names.
6)Verify that several of the items in your list are in
  fact in the incorrect catagory on the 3494 by issuing
  the following command against a couple of the
  volumes. replace vol_name with one of your volumes
       mtlib -l /dev/lmcp -qV -V vol_name
  for example
       mtlib -l /dev/lmcp0 -qV -V 000027
  Look at the catagory listed in the ouput. If you are
  using the default, a private volume should be listed as
  012C but you are probably seeing 012E.
7)Run the command to update the catagory to the
  private catagory. In the following command, you will
  replace the lmcp0 with whatever your 3494 libr name
  is on AIX. And you will replace filename with
  whatever you specified as a filename in the above
  command. Please note that you should fully qualify
  the file
     mtlib -l /dev/lmcp0 -C -L filename -t"012C"
8)Double check that a couple of the volumes were
  in fact correctly changed. I usually check at least
  the first and last item in the list.
9)Renable activity to tape library.

How to avoid having to do this again:
1)Avoid running an audit library or halting and
  and restarting your server
2)If you must stop and restart your server, verify that
  your private volumes are in the correct catagory by
  taking one or two volumes and issuing the mtlib
  command against them to display the 3494 catagory.
3)If the problem has occurred again, run the
  procedure listed above.



------------
APAR IC31823
APAR IC31823
------------
ABSTRACT:
ABSTRACT:
CANNOT UPGRADE FROM 3.7.4 TO 4.2 WITH LTO VOLUMES

ERROR DESCRIPTION:
Attempted upgrade from TSM 3.7.4 with a LTO library attached
with LTO volumes in our db to TSM 4.2.0 fails with following
message when trying to bring up the server:
ANR9999D pvr.c(2449): ThreadId<0> Unsupported function for
         device class 20.
ANR9999D asinit.c(341): ThreadId<0> Error converting volume
         attribs for devclass LTOCLASS.
This problem was addressed in APAR IC29442 for the same issue
when upgrading from 3.7.4 to 4.1.
INITIAL IMPACT: High

LOCAL FIX:
Upgrade to 4.1.3 first (where fix for IC29942 is present).
Then upgrade to 4.2.  Upgrading to 4.1.3 might vary depending
on platform.



------------
APAR IC31831
APAR IC31831
------------
ABSTRACT:
ABSTRACT:
LTO DEVICES WON'T WORK AFTER UPGRADE TO 4.2.1

ERROR DESCRIPTION:
If the TSM server is upgraded to 4.2.1 and has existing LTO de-
vices, they will not work. They will fail with the following
errors:

ANR8337I LTO volume XXX mounted in drive DRIVE1 (/dev/rmt1).
ANR8442E MOUNT REQUEST: Volume XXX in library 3584LTO is curr-
         ently in use.
ANR1401W Mount request denied for volume XXX

A PVR MMS trace shows the following:

mmsscsi.c(8489): Preparing private volume XXX in library 3584LTO
mmslib.c(7303): Obtaining reservation for volume XXX in library
       3584LTO; activity=4.
mmsscsi.c(1314): Problem verifying/reserving volume XXX is in
       library 3584LTO; rc = 4.
pvr.c(8156): PVR I/O agent (37) finished OPEN request; rc=4.

This only occurs after an upgrade to 4.2.1, a new install
should not have a problem.

INITIAL IMPACT: HIGH

LOCAL FIX:
N/A



------------
APAR IC31884
APAR IC31884
------------
ABSTRACT:
ABSTRACT:
V4.2.1 SERVER CORE DUMPING, ANR7837S ON LOCKCYCLE02

ERROR DESCRIPTION:
TSM Server V4.2.1 core dumps with ANR7837S Internal error
LOCKCYCLE02 detected. The error messages are in the dsmserv.err

Trace back in dsmserv.err:
ANR7838S Server operation terminated.
ANR7837S Internal error LOCKCYCLE02 detected.
  0x100085A4 pkLogicAbort
  0x100303F0 CheckLockCycles
  0x100324C0 TmFindDeadlock
  0x100322A4 TmDeadlockDetector
  0x10006DB4 StartThread
  0xD00081FC _pthread_body
ANR7833S Server thread 1 terminated in response to program abort
ANR7833S Server thread 2 terminated in response to program abort
.............
The LOCKCYCLE02 indicates that the problem is related to
transactions between the storage agent and TSM server. TSM
set a waiter flag in the lock request. Situations can occur
where the lock request is aborted. The abort causes a
LOCKCYCLE02 problem because the deadlock detector
woke up and went looking for waiters. Since there is a small
window between when the abort code signals the lock waiter
(because the mutex is released to allow the receiver to
respond), this allowed the deadlock detector to start looking
for deadlocks. Since the request had been satisfied by being
aborted, there were no locks being waited on (but the flag was
still set). Hence, TSM aborted because there was a waiter not
waiting on anything.

LOCAL FIX:
Set the RESOURCETIMEOUT value in dsmserv.opt to a higher
timeout value. (See the administrator reference for more
information.) Higher timeout value should allow the locke waiter
flag to clear.



------------
APAR IC31961
APAR IC31961
------------
ABSTRACT:
ABSTRACT:
MOUNT RESERVATION ANR8447E (INTERNAL DEFECT 31934)

ERROR DESCRIPTION:
During the mount point processing, PVR will obtain information
on drives only as it needs. Multiple calls to obtain the path
information for the same drive may be issued. Which is an
expected behaviour. There are instances where the "no drives
available" message is correct -- all drives are in-use and there
no need to force the dismount of an idle volume or to wait for a
dismounting volume.
............
The problem is that when TSM is looking for idle volumes to
steal, TSM never considered mount points in the reserved state
(or mpClean state in TSM levels prior to V421).
>>>>>>>>>>>>>>>
For example:
If the Mount limit is 4, the request is for one more mount point
and there are
1 reserved mount point
2 open mount points
1 idle mount point
TSM would see that it did not need to force an idle dismount
because 2 open mount points + 1 idle mount point + 1 new
request is equal to 4.  However the math should have included
the 1 reserved mount point.
..............
(This apar documents internal defect 31934)
LOCAL FIX:
N/A
<Prev in Thread] Current Thread [Next in Thread>