ADSM-L

Re: [ADSM-L] "Retry Dismount Failure" that won't clear

2010-04-27 13:39:57
Subject: Re: [ADSM-L] "Retry Dismount Failure" that won't clear
From: Robert Clark <robert.clark7 AT USBANK DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Tue, 27 Apr 2010 10:25:45 -0700
I would make sure RSM on the Windows host is not grabbing the tape drives.

What type of library is being emulated?

[RC]



From:
"John D. Schneider" <john.schneider AT COMPUTERCOACHINGCOMMUNITY DOT COM>
To:
ADSM-L AT VM.MARIST DOT EDU
Date:
04/27/2010 08:25 AM
Subject:
[ADSM-L] "Retry Dismount Failure" that won't clear
Sent by:
"ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>



Greetings,
    I have been through the archives for help with this one, but I still
don't have an answer.
    I support a TSM 5.4.3.0 server running on AIX 5.3ML9.  EMC Disk
Library for virtual tape, configured as 64 LTO1 tape drives.  This
server is the library master for both AIX and Windows Lan-free clients
running the 5.4.2.0 Lan-free storage agent.

    We came in yesterday and found 5 virtual tapes mounted, but in
"Retry Dismount Failure" state:

ANR8380I LTO volume V50135 is mounted R/O in drive EPC-LTO1-025
(/dev/epc-lto1-025), status: RETRY DISMOUNT FAILURE.
ANR8380I LTO volume V50128 is mounted R/O in drive EPC-LTO1-040
(/dev/epc-lto1-040), status: RETRY DISMOUNT FAILURE.
ANR8380I LTO volume V50097 is mounted R/O in drive EPC-LTO1-044
(/dev/epc-lto1-044), status: RETRY DISMOUNT FAILURE.
ANR8380I LTO volume V50129 is mounted R/O in drive EPC-LTO1-047
(/dev/epc-lto1-047), status: RETRY DISMOUNT FAILURE..
ANR8380I LTO volume V50317 is mounted R/O in drive EPC-LTO1-006
(/dev/epc-lto1-006), status: RETRY DISMOUNT FAILURE.

They have been in this state over 24 hours now, and we can't clear them.

We can tell this is problem caused because of a confusion between the
Library master and one of the Lan-free agents.  My surmise is that the
lan-free agent thinks it is finished with the drives, but that message
never gets to the TSM server.  Later the TSM Server's timeout logic
tries to reclaim the drive, but the lan-free server still has a SCSI
reserve on the tape drive, so the TSM Server can't open it to talk to
it.

We went out to the EDL appliance and dismounted the virtual tapes from
the drives, so they are empty.  We have tried restarting both the TSM
server software and Lan-free agent.  We have rebooted the Windows server
running the Lan-free agent.  We have deleted and rediscovered the AIX
rmt devices on the library master.  All those worked fine.  We did an
'update server STL-PVMCONBKP02 forcsync=yes' between the server an TSM
server and the lan-free agent, but that didn't help.

The 'Retry Dismount Failure' errors still persist.  Every little while
we still get the following messages in the server activity log.  Since
the session between the server and the lan-free agent STL-PVMCONBKP02
isn't getting any errors, it is not a simple communication problem
between them.

04/27/10 08:16:51     ANR0408I Session 11595 started for server
STL-PVMCONBKP02
                       (Windows) (Tcp/Ip) for library sharing.
(SESSION: 11595)
04/27/10 08:16:51     ANR0408I Session 11596 started for server
STL-PVMCONBKP02
                       (Windows) (Tcp/Ip) for library sharing.
(SESSION: 11596)
04/27/10 08:16:51     ANR0408I Session 11597 started for server
STL-PVMCONBKP02
                       (Windows) (Tcp/Ip) for library sharing.
(SESSION: 11597)
04/27/10 08:16:51     ANR0408I Session 11598 started for server
STL-PVMCONBKP02
                       (Windows) (Tcp/Ip) for library sharing.
(SESSION: 11598)
04/27/10 08:16:51     ANR0408I Session 11599 started for server
STL-PVMCONBKP02
                       (Windows) (Tcp/Ip) for library sharing.
(SESSION: 11599)
04/27/10 08:16:51     ANR0409I Session 11595 ended for server
STL-PVMCONBKP02
                       (Windows). (SESSION: 11595)

04/27/10 08:16:51     ANR0409I Session 11596 ended for server
STL-PVMCONBKP02
                       (Windows). (SESSION: 11596)

04/27/10 08:16:51     ANR0409I Session 11597 ended for server
STL-PVMCONBKP02
                       (Windows). (SESSION: 11597)

04/27/10 08:16:51     ANR0409I Session 11598 ended for server
STL-PVMCONBKP02
                       (Windows). (SESSION: 11598)

04/27/10 08:16:51     ANR0409I Session 11599 ended for server
STL-PVMCONBKP02
                       (Windows). (SESSION: 11599)

04/27/10 08:16:51     ANR1794W TSM SAN discovery is disabled by options.

                       (SESSION: 11595)

04/27/10 08:16:51     ANR8965W  The server is unable to automatically
determine
                       the serial number for the device.  (SESSION:
11595)
04/27/10 08:16:51     ANR8779E Unable to open drive /dev/epc-lto1-044,
error
                       number=16. (SESSION: 11595)

04/27/10 08:16:51     ANR1794W TSM SAN discovery is disabled by options.

                       (SESSION: 11598)

04/27/10 08:16:51     ANR8965W  The server is unable to automatically
determine
                       the serial number for the device.  (SESSION:
11598)
04/27/10 08:16:51     ANR8779E Unable to open drive /dev/epc-lto1-040,
error
                       number=16. (SESSION: 11598)

04/27/10 08:16:51     ANR1794W TSM SAN discovery is disabled by options.

                       (SESSION: 11596)

04/27/10 08:16:51     ANR8965W  The server is unable to automatically
determine
                       the serial number for the device.  (SESSION:
11596)
04/27/10 08:16:51     ANR8779E Unable to open drive /dev/epc-lto1-006,
error
                       number=16. (SESSION: 11596)

04/27/10 08:16:51     ANR1794W TSM SAN discovery is disabled by options.

                       (SESSION: 11599)

04/27/10 08:16:51     ANR8965W  The server is unable to automatically
determine
                       the serial number for the device.  (SESSION:
11599)
04/27/10 08:16:51     ANR8779E Unable to open drive /dev/epc-lto1-047,
error
                       number=16. (SESSION: 11599)

04/27/10 08:16:51     ANR1794W TSM SAN discovery is disabled by options.

                       (SESSION: 11597)

04/27/10 08:16:51     ANR8965W  The server is unable to automatically
determine
                       the serial number for the device.  (SESSION:
11597)
04/27/10 08:16:51     ANR8779E Unable to open drive /dev/epc-lto1-025,
error
                       number=16. (SESSION: 11597)


We have seen this before, but rarely, and in the past we were always
able to clear it by restarting the Lan-free agent and the TSM server
software.  But this time that isn't working.

FYI, every once in a while, the 'Retry Dismount Failure' will change to
'Dismounting' for a few seconds, then goes back to 'Retry Dismount
Failure' again.  So TSM is obviously trying to do something to clear it.

Can anyone suggest a procedure for clearing this condition?

Best Regards,

John D. Schneider
The Computer Coaching Community, LLC
Office: (314) 635-5424 / Toll Free: (866) 796-9226
Cell: (314) 750-8721



U.S. BANCORP made the following annotations
---------------------------------------------------------------------
Electronic Privacy Notice. This e-mail, and any attachments, contains 
information that is, or may be, covered by electronic communications privacy 
laws, and is also confidential and proprietary in nature. If you are not the 
intended recipient, please be advised that you are legally prohibited from 
retaining, using, copying, distributing, or otherwise disclosing this 
information in any manner. Instead, please reply to the sender that you have 
received this communication in error, and then immediately delete it. Thank you 
in advance for your cooperation.



---------------------------------------------------------------------