Hello all,
We have put our full attention on addressing the early problems seen in
4.2.1, and either have a fix already available or expect to deliver one
shortly. Despite following our strict development process, focusing significant
resource on design reviews, testing, and running a successful beta with several
large customers, we had some defect escapes
to the field that you may have encountered.
Our plan is to deliver a fixtest called 4.2.1.6 shortly with all of the
fixes currently available on various platforms rolled up into one level. We
have corrected problems relating to mount point management, LTO
devices, 3494 libraries, and a server crash. Applicable APARs are IC31961,
IC31823, IC31831,
IC31691, and IC31884. While some fixes are available in various prior
patch levels, the rollup fix will address all of these problems. If you
have problems in these areas, but your problems are not covered by the
description in these APARs, please
contact Tivoli service.
We have the fixes running in our test environments and at some customer
accounts. We have confidence the fixes will correct these reported problems.
Ultimately we believe the changes made to these
code paths that were introduced into 4.2.1 will be of great benefit and will
improve your satisfaction with our product. We
apologize for letting these defects escape our process.
For your convenience, the text of the APARs listed above appears below my
signature information.
Regards,
Andy
Andy Raibeck
IBM Software Group
Tivoli Storage Manager Client Development
Internal Notes e-mail: Andrew Raibeck/Tucson/IBM@IBMUS
Internet e-mail: storman AT us.ibm DOT com
The only dumb question is the one that goes unasked.
The command line is your friend.
"Good enough" is the enemy of excellence.
------------
APAR IC31691
APAR IC31691
------------
ABSTRACT:
ABSTRACT:
LIBRARY AUDIT ON 349X MAY NOT CORRECT CATEGORY MISMATCHES
ERROR DESCRIPTION:
The LIBRARY AUDIT command for the
3494 and 3495 libraries may not be
able to correct category mismatches
of private volumes.
LOCAL FIX:
Recovery Procedure:
1)Stop activity to your library (i.e. mounting/dismounting
of tapes) if any activity is going on.
2)Determine the name of the libr from aix perspective.
(Usually default is lmcp0). In the procedure that follows,
replace lmcp0 with whatever name is appropriate for
your system. (lsdev -Cc tape should list the tape
devices and you can find the libr name there).
3)Determine whether you are using the default
catagories or whether you have specified catagories.
The default catagories in TSM are 300 and 301 and
can be found via a Q libr command. If you are using
the defaults, the numbers in hex that you will need
below are for scratch 012E and for private is
hex 012C. If you are using the defaults, go to the next
step now. If you are NOT using the defaults, you will
have to convert the number shown on the Q libr
command for scratch and private from decimal to hex.
When you convert the number for scratch, you must
add 1 to it before you convert. (This is because Q libr
shows catagory for 3490 and catagory for 3590 is one
higher).
4)In TSM, do an sql statement to find all volumes in
private and redirect that to a file:
select volume_name from libvolumes
where status='Private' > filename
5)At the AIX prompt, edit the file created in the above
step to remove the header lines. There are usually at
least 2 lines before the first volume is listed. The file
should only contain volume names.
6)Verify that several of the items in your list are in
fact in the incorrect catagory on the 3494 by issuing
the following command against a couple of the
volumes. replace vol_name with one of your volumes
mtlib -l /dev/lmcp -qV -V vol_name
for example
mtlib -l /dev/lmcp0 -qV -V 000027
Look at the catagory listed in the ouput. If you are
using the default, a private volume should be listed as
012C but you are probably seeing 012E.
7)Run the command to update the catagory to the
private catagory. In the following command, you will
replace the lmcp0 with whatever your 3494 libr name
is on AIX. And you will replace filename with
whatever you specified as a filename in the above
command. Please note that you should fully qualify
the file
mtlib -l /dev/lmcp0 -C -L filename -t"012C"
8)Double check that a couple of the volumes were
in fact correctly changed. I usually check at least
the first and last item in the list.
9)Renable activity to tape library.
How to avoid having to do this again:
1)Avoid running an audit library or halting and
and restarting your server
2)If you must stop and restart your server, verify that
your private volumes are in the correct catagory by
taking one or two volumes and issuing the mtlib
command against them to display the 3494 catagory.
3)If the problem has occurred again, run the
procedure listed above.
------------
APAR IC31823
APAR IC31823
------------
ABSTRACT:
ABSTRACT:
CANNOT UPGRADE FROM 3.7.4 TO 4.2 WITH LTO VOLUMES
ERROR DESCRIPTION:
Attempted upgrade from TSM 3.7.4 with a LTO library attached
with LTO volumes in our db to TSM 4.2.0 fails with following
message when trying to bring up the server:
ANR9999D pvr.c(2449): ThreadId<0> Unsupported function for
device class 20.
ANR9999D asinit.c(341): ThreadId<0> Error converting volume
attribs for devclass LTOCLASS.
This problem was addressed in APAR IC29442 for the same issue
when upgrading from 3.7.4 to 4.1.
INITIAL IMPACT: High
LOCAL FIX:
Upgrade to 4.1.3 first (where fix for IC29942 is present).
Then upgrade to 4.2. Upgrading to 4.1.3 might vary depending
on platform.
------------
APAR IC31831
APAR IC31831
------------
ABSTRACT:
ABSTRACT:
LTO DEVICES WON'T WORK AFTER UPGRADE TO 4.2.1
ERROR DESCRIPTION:
If the TSM server is upgraded to 4.2.1 and has existing LTO de-
vices, they will not work. They will fail with the following
errors:
ANR8337I LTO volume XXX mounted in drive DRIVE1 (/dev/rmt1).
ANR8442E MOUNT REQUEST: Volume XXX in library 3584LTO is curr-
ently in use.
ANR1401W Mount request denied for volume XXX
A PVR MMS trace shows the following:
mmsscsi.c(8489): Preparing private volume XXX in library 3584LTO
mmslib.c(7303): Obtaining reservation for volume XXX in library
3584LTO; activity=4.
mmsscsi.c(1314): Problem verifying/reserving volume XXX is in
library 3584LTO; rc = 4.
pvr.c(8156): PVR I/O agent (37) finished OPEN request; rc=4.
This only occurs after an upgrade to 4.2.1, a new install
should not have a problem.
INITIAL IMPACT: HIGH
LOCAL FIX:
N/A
------------
APAR IC31884
APAR IC31884
------------
ABSTRACT:
ABSTRACT:
V4.2.1 SERVER CORE DUMPING, ANR7837S ON LOCKCYCLE02
ERROR DESCRIPTION:
TSM Server V4.2.1 core dumps with ANR7837S Internal error
LOCKCYCLE02 detected. The error messages are in the dsmserv.err
Trace back in dsmserv.err:
ANR7838S Server operation terminated.
ANR7837S Internal error LOCKCYCLE02 detected.
0x100085A4 pkLogicAbort
0x100303F0 CheckLockCycles
0x100324C0 TmFindDeadlock
0x100322A4 TmDeadlockDetector
0x10006DB4 StartThread
0xD00081FC _pthread_body
ANR7833S Server thread 1 terminated in response to program abort
ANR7833S Server thread 2 terminated in response to program abort
.............
The LOCKCYCLE02 indicates that the problem is related to
transactions between the storage agent and TSM server. TSM
set a waiter flag in the lock request. Situations can occur
where the lock request is aborted. The abort causes a
LOCKCYCLE02 problem because the deadlock detector
woke up and went looking for waiters. Since there is a small
window between when the abort code signals the lock waiter
(because the mutex is released to allow the receiver to
respond), this allowed the deadlock detector to start looking
for deadlocks. Since the request had been satisfied by being
aborted, there were no locks being waited on (but the flag was
still set). Hence, TSM aborted because there was a waiter not
waiting on anything.
LOCAL FIX:
Set the RESOURCETIMEOUT value in dsmserv.opt to a higher
timeout value. (See the administrator reference for more
information.) Higher timeout value should allow the locke waiter
flag to clear.
------------
APAR IC31961
APAR IC31961
------------
ABSTRACT:
ABSTRACT:
MOUNT RESERVATION ANR8447E (INTERNAL DEFECT 31934)
ERROR DESCRIPTION:
During the mount point processing, PVR will obtain information
on drives only as it needs. Multiple calls to obtain the path
information for the same drive may be issued. Which is an
expected behaviour. There are instances where the "no drives
available" message is correct -- all drives are in-use and there
no need to force the dismount of an idle volume or to wait for a
dismounting volume.
............
The problem is that when TSM is looking for idle volumes to
steal, TSM never considered mount points in the reserved state
(or mpClean state in TSM levels prior to V421).
>>>>>>>>>>>>>>>
For example:
If the Mount limit is 4, the request is for one more mount point
and there are
1 reserved mount point
2 open mount points
1 idle mount point
TSM would see that it did not need to force an idle dismount
because 2 open mount points + 1 idle mount point + 1 new
request is equal to 4. However the math should have included
the 1 reserved mount point.
..............
(This apar documents internal defect 31934)
LOCAL FIX:
N/A
|