Re: Library down -

This is a tech tip from the StorageTek CRC:


In its normal initialization sequence, TSM locks all of its library
resources under a common lock id.  On rare occasions, TSM has been known to
initialize resources with a second lockid after existing resources had been
locked under a different lockid.  In this situation, intervention will be
required to establish a common lockid for all resources.

The standard intervention procedure would be to use the ACSLS cmd_proc to
clear all of the locks on library resources.  First, get a list of lockids:

        ACSSA> query lock volume all
        ACSSA> query lock drive all

Look for the lockid associated with these resources.

Now, remove each lockid as follows:

        ACSSA> set lock <lockid>
        ACSSA> clear lock volume all
        ACSSA> clear lock drive all

Repeat this sequence for each unique lockid.  Once the locks on all
resources have been removed, you can restart TSM.   When TSM software
initializes, it will lock all of its library resources under a single
lockid.

There may be an occasion in which you cannot set your lockid to a known
lockid value.  The most likely cause for this condition would be that the
lockid record has been removed while resources had been locked under that
lockid.  A bug was introduced in ACSLS 6.0 (6.0.1) in which it is possible
to remove a lockid record of locked resources by attempting to lock a
non-existing resource under that lockid.  If this bug is encountered, it
will force TSM to lock subsequent resources under a new lockid.   A fix for
this ACSLS bug is available in PTF760827 (for Solaris) and PTF762430 (for
AIX).  The fix has been rolled into ACSLS 6.1.

One way to determine whether a lockid record had been removed is to query
the database directly.
First, determine the lockid of all locked resources:

        sql.sh "select lock_id from volumetable where lock_id<>0"
        sql.sh "select lock_id from drivetable where lock_id<>0"

Now, confirm that a lockid record exists for each lockid you established
above:

        sql.sh "select lock_id, user_id from lockidtable"

If you find that a lockid exists in the volumetable or the drivetable, but
that lockid does not exist in the lockidtable, then this is a sign that the
lockid record has been removed.  In this case, use the following procedure
to correct the situation:

        1.  Install PTFPTF760827 (for Solaris) or PTF762430 (for AIX).
        2.  Remove all lockids from the ACSLS volumetable and drivetable as
follows:

                sql.sh "update volumetable set lock_id=0 where lock_id<>0"
                sql.sh "update drivetable set lock_id=0 where lock_id<>0"

        3.  Drop the lockidtable as follows:

                sql.sh "drop table lockidtable"

        4. Rebuild the lockidtable, using acsss_config.

                kill.acsss
                acsss_config
                      select option 7 (exit)
                      This will create a new lockidtable
                rc.acsss

        5.  Restart TSM software to establish a new common lockid.

Hope that this helps.

Terry D. Schmitt
Software Engineer, Sr.
ACSLS Change Team
303.661.5874 phone
303.661.5712 fax
terry_schmitt AT storagetek DOT com
StorageTek
INFORMATION made POWERFUL

-----Original Message-----
From: PINNI, BALANAND (SBCSI) [mailto:bp3965 AT SBC DOT COM]
Sent: March 13, 2003 9:59 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Library down -


All-

Today I shutdown TSM server and re booted AIX machine.
When I manually start I get this error.ACSLS Server was re booted but
problem still exists.Just by stopping and restarting server I see this
message.
TSM can not acess library now!!!!

                      removed.
03/13/03   09:55:34  ANR8855E ACSAPI(acs_lock_volume) response with
                      unsuccessful status, status=STATUS_LOCK_FAILED.
03/13/03   09:55:34  ANR8851E Initialization failed for ACSLS library
ACS_LIB1;


I did audit on ACSLS acs it's fine.I did audit on db it is also ok.

Please help .Thanks in advance.

Balanand Pinni

-----Original Message-----
From: Alex Paschal [mailto:AlexPaschal AT FREIGHTLINER DOT COM]
Sent: Thursday, March 13, 2003 10:48 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: Restore performance

Thomas,

I agree with Richard Sims, you're probably between a rock and a hard place.
If you're not able to get your restore working reasonably quickly, here's
something you might try.  It's a little bit of work, but it should work.

dsmadmc -id=id -pa=pa -comma -out=tempfile select \* from backups where
node_name=\'NODENAME\' and filespace_name=\'/FSNAME/\' and filespace_id=ID
and state=\'INACTIVE_VERSION\' and TYPE=\'DIR\' and hl_name like
\'dir.to.restore.within.FS\%\'

Then process the tempfile to create a list of the directories that have
files you want restored (sorting, filtering, whatever).  I would probably
use the deactivate_date to just get the directories that were deactivated at
the right date (doable within the select, but it might tell you something to
see all of them), then trim out the various unnecessary columns and
concatenate hl_name and ll_name, get rid of any duplicates.  Run a script
that does a dsmc restore -pitd for each line of the temp file without the
-subdir=yes command.  That will speed things up considerably and you'll be
able to monitor progress.  Additionally, if necessary, you can stop the
script and pick up where you left off without having to redo the whole
thing.

Good luck.

Alex Paschal
Freightliner, LLC
(503) 745-6850 phone/vmail


-----Original Message-----
From: Thomas Denier [mailto:Thomas.Denier AT MAIL.TJU DOT EDU]
Sent: Thursday, March 13, 2003 7:17 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: Restore performance


> Because of the -subdir=yes specification, omitting the ending slash could
> cause TSM to search for all files named "saa001" in /var/spool/imap/user
> and its subdirectories. If these are very large, then that could be the
> cause of the Based on the size of these directories, it could be very
> timeconsuming. Also, it is good practice to put an ending slash after the
> target directory name.
>
> Putting the ending slashes should make things better, plus you should get
> the benefit of no query restore.

We have retried the restore with the trailing slashes, and things have
not gotten any better.

The performance of our TSM server degrades over time. We are finding it
necessary to restart the server at least twice a day to maintain
even marginally acceptable performance. Unfortunately, we are finding
that the end of support for 4.2 has, for all practical purposes, already
happened. It seems clear that IBM's strategy for responding to our
performance problem is to stall until April 15. We are concentrating
on completing tests of the 5.1 server, and living with the frequent
restarts in the meantime. The last few attempts at the problem restore
have not gotten as far as requesting a tape mount before a server
restart occured. The restart terminate the restore session but leaves
a restartable restore behind. The client administrator has issued
'restart restore' commands after the last couple of restarts, arguing
that this will enable restore processing to pick up where it left off.
Is he correct, given that the restore process was terminated before
it got as far as requesting its first tape mount?