Re: TSM Server v4.2.1
2001-10-30 04:47:03
Andy,
do you mean 4.1.2.16 and not 4.1.2.6 ?
> René Lambelet
> Center Information System, Nestec Ltd
> tel + 41 21 924 3543
> fax + 41 21 924 1369
> visit the Nestle site: http://www.nestle.com
>
>
> -----Original Message-----
> From: Andrew Raibeck [SMTP:storman AT US.IBM DOT COM]
> Sent: Monday, October 29, 2001 6:34 PM
> To: ADSM-L AT VM.MARIST DOT EDU
> Subject: TSM Server v4.2.1
>
> Hello all,
>
> We have put our full attention on addressing the early problems seen in
> 4.2.1, and either have a fix already available or expect to deliver one
> shortly. Despite following our strict development process, focusing
> significant
> resource on design reviews, testing, and running a successful beta with
> several large customers, we had some defect escapes
> to the field that you may have encountered.
>
> Our plan is to deliver a fixtest called 4.2.1.6 shortly with all of the
> fixes currently available on various platforms rolled up into one level.
> We have corrected problems relating to mount point management, LTO
> devices, 3494 libraries, and a server crash. Applicable APARs are IC31961,
> IC31823, IC31831,
> IC31691, and IC31884. While some fixes are available in various prior
> patch levels, the rollup fix will address all of these problems. If you
> have problems in these areas, but your problems are not covered by the
> description in these APARs, please
> contact Tivoli service.
>
> We have the fixes running in our test environments and at some customer
> accounts. We have confidence the fixes will correct these reported
> problems. Ultimately we believe the changes made to these
> code paths that were introduced into 4.2.1 will be of great benefit and
> will improve your satisfaction with our product. We
> apologize for letting these defects escape our process.
>
> For your convenience, the text of the APARs listed above appears below my
> signature information.
>
> Regards,
>
> Andy
>
> Andy Raibeck
> IBM Software Group
> Tivoli Storage Manager Client Development
> Internal Notes e-mail: Andrew Raibeck/Tucson/IBM@IBMUS
> Internet e-mail: storman AT us.ibm DOT com
>
> The only dumb question is the one that goes unasked.
> The command line is your friend.
> "Good enough" is the enemy of excellence.
>
> ------------
> APAR IC31691
> ------------
> ABSTRACT:
> LIBRARY AUDIT ON 349X MAY NOT CORRECT CATEGORY MISMATCHES
>
> ERROR DESCRIPTION:
> The LIBRARY AUDIT command for the
> 3494 and 3495 libraries may not be
> able to correct category mismatches
> of private volumes.
>
> LOCAL FIX:
> Recovery Procedure:
> 1)Stop activity to your library (i.e. mounting/dismounting
> of tapes) if any activity is going on.
> 2)Determine the name of the libr from aix perspective.
> (Usually default is lmcp0). In the procedure that follows,
> replace lmcp0 with whatever name is appropriate for
> your system. (lsdev -Cc tape should list the tape
> devices and you can find the libr name there).
> 3)Determine whether you are using the default
> catagories or whether you have specified catagories.
> The default catagories in TSM are 300 and 301 and
> can be found via a Q libr command. If you are using
> the defaults, the numbers in hex that you will need
> below are for scratch 012E and for private is
> hex 012C. If you are using the defaults, go to the next
> step now. If you are NOT using the defaults, you will
> have to convert the number shown on the Q libr
> command for scratch and private from decimal to hex.
> When you convert the number for scratch, you must
> add 1 to it before you convert. (This is because Q libr
> shows catagory for 3490 and catagory for 3590 is one
> higher).
> 4)In TSM, do an sql statement to find all volumes in
> private and redirect that to a file:
> select volume_name from libvolumes
> where status='Private' > filename
> 5)At the AIX prompt, edit the file created in the above
> step to remove the header lines. There are usually at
> least 2 lines before the first volume is listed. The file
> should only contain volume names.
> 6)Verify that several of the items in your list are in
> fact in the incorrect catagory on the 3494 by issuing
> the following command against a couple of the
> volumes. replace vol_name with one of your volumes
> mtlib -l /dev/lmcp -qV -V vol_name
> for example
> mtlib -l /dev/lmcp0 -qV -V 000027
> Look at the catagory listed in the ouput. If you are
> using the default, a private volume should be listed as
> 012C but you are probably seeing 012E.
> 7)Run the command to update the catagory to the
> private catagory. In the following command, you will
> replace the lmcp0 with whatever your 3494 libr name
> is on AIX. And you will replace filename with
> whatever you specified as a filename in the above
> command. Please note that you should fully qualify
> the file
> mtlib -l /dev/lmcp0 -C -L filename -t"012C"
> 8)Double check that a couple of the volumes were
> in fact correctly changed. I usually check at least
> the first and last item in the list.
> 9)Renable activity to tape library.
>
> How to avoid having to do this again:
> 1)Avoid running an audit library or halting and
> and restarting your server
> 2)If you must stop and restart your server, verify that
> your private volumes are in the correct catagory by
> taking one or two volumes and issuing the mtlib
> command against them to display the 3494 catagory.
> 3)If the problem has occurred again, run the
> procedure listed above.
>
>
>
> ------------
> APAR IC31823
> ------------
> ABSTRACT:
> CANNOT UPGRADE FROM 3.7.4 TO 4.2 WITH LTO VOLUMES
>
> ERROR DESCRIPTION:
> Attempted upgrade from TSM 3.7.4 with a LTO library attached
> with LTO volumes in our db to TSM 4.2.0 fails with following
> message when trying to bring up the server:
> ANR9999D pvr.c(2449): ThreadId<0> Unsupported function for
> device class 20.
> ANR9999D asinit.c(341): ThreadId<0> Error converting volume
> attribs for devclass LTOCLASS.
> This problem was addressed in APAR IC29442 for the same issue
> when upgrading from 3.7.4 to 4.1.
> INITIAL IMPACT: High
>
> LOCAL FIX:
> Upgrade to 4.1.3 first (where fix for IC29942 is present).
> Then upgrade to 4.2. Upgrading to 4.1.3 might vary depending
> on platform.
>
>
>
> ------------
> APAR IC31831
> ------------
> ABSTRACT:
> LTO DEVICES WON'T WORK AFTER UPGRADE TO 4.2.1
>
> ERROR DESCRIPTION:
> If the TSM server is upgraded to 4.2.1 and has existing LTO de-
> vices, they will not work. They will fail with the following
> errors:
>
> ANR8337I LTO volume XXX mounted in drive DRIVE1 (/dev/rmt1).
> ANR8442E MOUNT REQUEST: Volume XXX in library 3584LTO is curr-
> ently in use.
> ANR1401W Mount request denied for volume XXX
>
> A PVR MMS trace shows the following:
>
> mmsscsi.c(8489): Preparing private volume XXX in library 3584LTO
> mmslib.c(7303): Obtaining reservation for volume XXX in library
> 3584LTO; activity=4.
> mmsscsi.c(1314): Problem verifying/reserving volume XXX is in
> library 3584LTO; rc = 4.
> pvr.c(8156): PVR I/O agent (37) finished OPEN request; rc=4.
>
> This only occurs after an upgrade to 4.2.1, a new install
> should not have a problem.
>
> INITIAL IMPACT: HIGH
>
> LOCAL FIX:
> N/A
>
>
>
> ------------
> APAR IC31884
> ------------
> ABSTRACT:
> V4.2.1 SERVER CORE DUMPING, ANR7837S ON LOCKCYCLE02
>
> ERROR DESCRIPTION:
> TSM Server V4.2.1 core dumps with ANR7837S Internal error
> LOCKCYCLE02 detected. The error messages are in the dsmserv.err
>
> Trace back in dsmserv.err:
> ANR7838S Server operation terminated.
> ANR7837S Internal error LOCKCYCLE02 detected.
> 0x100085A4 pkLogicAbort
> 0x100303F0 CheckLockCycles
> 0x100324C0 TmFindDeadlock
> 0x100322A4 TmDeadlockDetector
> 0x10006DB4 StartThread
> 0xD00081FC _pthread_body
> ANR7833S Server thread 1 terminated in response to program abort
> ANR7833S Server thread 2 terminated in response to program abort
> .............
> The LOCKCYCLE02 indicates that the problem is related to
> transactions between the storage agent and TSM server. TSM
> set a waiter flag in the lock request. Situations can occur
> where the lock request is aborted. The abort causes a
> LOCKCYCLE02 problem because the deadlock detector
> woke up and went looking for waiters. Since there is a small
> window between when the abort code signals the lock waiter
> (because the mutex is released to allow the receiver to
> respond), this allowed the deadlock detector to start looking
> for deadlocks. Since the request had been satisfied by being
> aborted, there were no locks being waited on (but the flag was
> still set). Hence, TSM aborted because there was a waiter not
> waiting on anything.
>
> LOCAL FIX:
> Set the RESOURCETIMEOUT value in dsmserv.opt to a higher
> timeout value. (See the administrator reference for more
> information.) Higher timeout value should allow the locke waiter
> flag to clear.
>
>
>
> ------------
> APAR IC31961
> ------------
> ABSTRACT:
> MOUNT RESERVATION ANR8447E (INTERNAL DEFECT 31934)
>
> ERROR DESCRIPTION:
> During the mount point processing, PVR will obtain information
> on drives only as it needs. Multiple calls to obtain the path
> information for the same drive may be issued. Which is an
> expected behaviour. There are instances where the "no drives
> available" message is correct -- all drives are in-use and there
> no need to force the dismount of an idle volume or to wait for a
> dismounting volume.
> ............
> The problem is that when TSM is looking for idle volumes to
> steal, TSM never considered mount points in the reserved state
> (or mpClean state in TSM levels prior to V421).
> >>>>>>>>>>>>>>>
> For example:
> If the Mount limit is 4, the request is for one more mount point
> and there are
> - 1 reserved mount point
> - 2 open mount points
> - 1 idle mount point
> TSM would see that it did not need to force an idle dismount
> because 2 open mount points + 1 idle mount point + 1 new
> request is equal to 4. However the math should have included
> the 1 reserved mount point.
> ..............
> (This apar documents internal defect 31934)
> LOCAL FIX:
> N/A
|
|
|