ADSM-L

Re: TSM Server v4.2.1

2001-10-30 04:47:03
Subject: Re: TSM Server v4.2.1
From: "Lambelet,Rene,VEVEY,GL-IS/CIS" <Rene.Lambelet AT NESTLE DOT COM>
Date: Tue, 30 Oct 2001 10:44:10 +0100
Andy,
do you mean 4.1.2.16 and not 4.1.2.6 ?

>               René Lambelet
>               Center Information System, Nestec Ltd
>               tel + 41 21 924 3543
>               fax + 41 21 924 1369
> visit the Nestle site: http://www.nestle.com
> 
> 
> -----Original Message-----
> From: Andrew Raibeck [SMTP:storman AT US.IBM DOT COM]
> Sent: Monday, October 29, 2001 6:34 PM
> To:   ADSM-L AT VM.MARIST DOT EDU
> Subject:      TSM Server v4.2.1
> 
> Hello all,
> 
> We have put our full attention on addressing the early problems seen in
> 4.2.1, and either have a fix already available or expect to deliver one
> shortly. Despite following our strict development process, focusing
> significant
> resource on design reviews, testing, and running a successful beta with
> several large customers, we had some defect escapes
> to the field that you may have encountered.
> 
> Our plan is to deliver a fixtest called 4.2.1.6 shortly with all of the
> fixes currently available on various platforms rolled up into one level.
> We have corrected problems relating to mount point management, LTO
> devices, 3494 libraries, and a server crash. Applicable APARs are IC31961,
> IC31823, IC31831,
> IC31691, and IC31884. While some fixes are available in various prior
> patch levels, the rollup fix will address all of these problems. If you
> have problems in these areas, but your problems are not covered by the
> description in these APARs, please
> contact Tivoli service.
> 
> We have the fixes running in our test environments and at some customer
> accounts. We have confidence the fixes will correct these reported
> problems. Ultimately we believe the changes made to these
> code paths that were introduced into 4.2.1 will be of great benefit and
> will improve your satisfaction with our product. We
> apologize for letting these defects escape our process.
> 
> For your convenience, the text of the APARs listed above appears below my
> signature information.
> 
> Regards,
> 
> Andy
> 
> Andy Raibeck
> IBM Software Group
> Tivoli Storage Manager Client Development
> Internal Notes e-mail: Andrew Raibeck/Tucson/IBM@IBMUS
> Internet e-mail: storman AT us.ibm DOT com
> 
> The only dumb question is the one that goes unasked.
> The command line is your friend.
> "Good enough" is the enemy of excellence.
> 
> ------------
> APAR IC31691
> ------------
> ABSTRACT:
> LIBRARY AUDIT ON 349X MAY NOT CORRECT CATEGORY MISMATCHES
> 
> ERROR DESCRIPTION:
> The LIBRARY AUDIT command for the
> 3494 and 3495 libraries may not be
> able to correct category mismatches
> of private volumes.
> 
> LOCAL FIX:
> Recovery Procedure:
> 1)Stop activity to your library (i.e. mounting/dismounting
>   of tapes) if any activity is going on.
> 2)Determine the name of the libr from aix perspective.
>   (Usually default is lmcp0).  In the procedure that follows,
>   replace lmcp0 with whatever name is appropriate for
>   your system. (lsdev -Cc tape should list the tape
>   devices and you can find the libr name there).
> 3)Determine whether you are using the default
>   catagories or whether you have specified catagories.
>   The default catagories in TSM are 300 and 301 and
>   can be found via a Q libr command. If you are using
>   the defaults, the numbers in hex that you will need
>   below are for scratch 012E and for private is
>   hex 012C. If you are using the defaults, go to the next
>   step now. If you are NOT using the defaults,  you will
>   have to convert the number shown on the Q libr
>   command for scratch and private from decimal to hex.
>   When you convert the number for scratch, you must
>   add 1 to it before you convert.  (This is because Q libr
>   shows catagory for 3490 and catagory for 3590 is one
>   higher).
> 4)In TSM, do an sql statement to find all volumes in
>   private and redirect that to a file:
>    select volume_name from libvolumes
>           where status='Private'  > filename
> 5)At the AIX prompt, edit the file created in the above
>   step to remove the header lines. There are usually at
>   least 2 lines before the first volume is listed. The file
>   should only contain volume names.
> 6)Verify that several of the items in your list are in
>   fact in the incorrect catagory on the 3494 by issuing
>   the following command against a couple of the
>   volumes. replace vol_name with one of your volumes
>        mtlib -l /dev/lmcp -qV -V vol_name
>   for example
>        mtlib -l /dev/lmcp0 -qV -V 000027
>   Look at the catagory listed in the ouput. If you are
>   using the default, a private volume should be listed as
>   012C but you are probably seeing 012E.
> 7)Run the command to update the catagory to the
>   private catagory. In the following command, you will
>   replace the lmcp0 with whatever your 3494 libr name
>   is on AIX. And you will replace filename with
>   whatever you specified as a filename in the above
>   command. Please note that you should fully qualify
>   the file
>      mtlib -l /dev/lmcp0 -C -L filename -t"012C"
> 8)Double check that a couple of the volumes were
>   in fact correctly changed. I usually check at least
>   the first and last item in the list.
> 9)Renable activity to tape library.
> 
> How to avoid having to do this again:
> 1)Avoid running an audit library or halting and
>   and restarting your server
> 2)If you must stop and restart your server, verify that
>   your private volumes are in the correct catagory by
>   taking one or two volumes and issuing the mtlib
>   command against them to display the 3494 catagory.
> 3)If the problem has occurred again, run the
>   procedure listed above.
> 
> 
> 
> ------------
> APAR IC31823
> ------------
> ABSTRACT:
> CANNOT UPGRADE FROM 3.7.4 TO 4.2 WITH LTO VOLUMES
> 
> ERROR DESCRIPTION:
> Attempted upgrade from TSM 3.7.4 with a LTO library attached
> with LTO volumes in our db to TSM 4.2.0 fails with following
> message when trying to bring up the server:
> ANR9999D pvr.c(2449): ThreadId<0> Unsupported function for
>          device class 20.
> ANR9999D asinit.c(341): ThreadId<0> Error converting volume
>          attribs for devclass LTOCLASS.
> This problem was addressed in APAR IC29442 for the same issue
> when upgrading from 3.7.4 to 4.1.
> INITIAL IMPACT: High
> 
> LOCAL FIX:
> Upgrade to 4.1.3 first (where fix for IC29942 is present).
> Then upgrade to 4.2.  Upgrading to 4.1.3 might vary depending
> on platform.
> 
> 
> 
> ------------
> APAR IC31831
> ------------
> ABSTRACT:
> LTO DEVICES WON'T WORK AFTER UPGRADE TO 4.2.1
> 
> ERROR DESCRIPTION:
> If the TSM server is upgraded to 4.2.1 and has existing LTO de-
> vices, they will not work. They will fail with the following
> errors:
> 
> ANR8337I LTO volume XXX mounted in drive DRIVE1 (/dev/rmt1).
> ANR8442E MOUNT REQUEST: Volume XXX in library 3584LTO is curr-
>          ently in use.
> ANR1401W Mount request denied for volume XXX
> 
> A PVR MMS trace shows the following:
> 
> mmsscsi.c(8489): Preparing private volume XXX in library 3584LTO
> mmslib.c(7303): Obtaining reservation for volume XXX in library
>        3584LTO; activity=4.
> mmsscsi.c(1314): Problem verifying/reserving volume XXX is in
>        library 3584LTO; rc = 4.
> pvr.c(8156): PVR I/O agent (37) finished OPEN request; rc=4.
> 
> This only occurs after an upgrade to 4.2.1, a new install
> should not have a problem.
> 
> INITIAL IMPACT: HIGH
> 
> LOCAL FIX:
> N/A
> 
> 
> 
> ------------
> APAR IC31884
> ------------
> ABSTRACT:
> V4.2.1 SERVER CORE DUMPING, ANR7837S ON LOCKCYCLE02
> 
> ERROR DESCRIPTION:
> TSM Server V4.2.1 core dumps with ANR7837S Internal error
> LOCKCYCLE02 detected. The error messages are in the dsmserv.err
> 
> Trace back in dsmserv.err:
> ANR7838S Server operation terminated.
> ANR7837S Internal error LOCKCYCLE02 detected.
>   0x100085A4 pkLogicAbort
>   0x100303F0 CheckLockCycles
>   0x100324C0 TmFindDeadlock
>   0x100322A4 TmDeadlockDetector
>   0x10006DB4 StartThread
>   0xD00081FC _pthread_body
> ANR7833S Server thread 1 terminated in response to program abort
> ANR7833S Server thread 2 terminated in response to program abort
> .............
> The LOCKCYCLE02 indicates that the problem is related to
> transactions between the storage agent and TSM server. TSM
> set a waiter flag in the lock request. Situations can occur
> where the lock request is aborted. The abort causes a
> LOCKCYCLE02 problem because the deadlock detector
> woke up and went looking for waiters. Since there is a small
> window between when the abort code signals the lock waiter
> (because the mutex is released to allow the receiver to
> respond), this allowed the deadlock detector to start looking
> for deadlocks. Since the request had been satisfied by being
> aborted, there were no locks being waited on (but the flag was
> still set). Hence, TSM aborted because there was a waiter not
> waiting on anything.
> 
> LOCAL FIX:
> Set the RESOURCETIMEOUT value in dsmserv.opt to a higher
> timeout value. (See the administrator reference for more
> information.) Higher timeout value should allow the locke waiter
> flag to clear.
> 
> 
> 
> ------------
> APAR IC31961
> ------------
> ABSTRACT:
> MOUNT RESERVATION ANR8447E (INTERNAL DEFECT 31934)
> 
> ERROR DESCRIPTION:
> During the mount point processing, PVR will obtain information
> on drives only as it needs. Multiple calls to obtain the path
> information for the same drive may be issued. Which is an
> expected behaviour. There are instances where the "no drives
> available" message is correct -- all drives are in-use and there
> no need to force the dismount of an idle volume or to wait for a
> dismounting volume.
> ............
> The problem is that when TSM is looking for idle volumes to
> steal, TSM never considered mount points in the reserved state
> (or mpClean state in TSM levels prior to V421).
> >>>>>>>>>>>>>>>
> For example:
> If the Mount limit is 4, the request is for one more mount point
> and there are
> - 1 reserved mount point
> - 2 open mount points
> - 1 idle mount point
> TSM would see that it did not need to force an idle dismount
> because 2 open mount points + 1 idle mount point + 1 new
> request is equal to 4.  However the math should have included
> the 1 reserved mount point.
> ..............
> (This apar documents internal defect 31934)
> LOCAL FIX:
> N/A
<Prev in Thread] Current Thread [Next in Thread>