ADSM-L

Re: [ADSM-L] "Retry Dismount Failure" that won't clear

2010-04-27 19:04:54
Subject: Re: [ADSM-L] "Retry Dismount Failure" that won't clear
From: Robert Clark <robert.clark7 AT USBANK DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Tue, 27 Apr 2010 16:03:09 -0700
Hi John,

On a physical 3584, I would imagine that powercycling the tape drives
would clear the SCSI reserve. Or in some of the weird cases, unplugging
and plugging back in the fibre cables.

The questions were meant to determine if you have access to tapeutil. If
you can demonstrate problems using tapeutil, then the VTL or the OS or
atape is likely at fault.

If you can't demonstrate any of the problems with tapeutil, then I'd look
to TSM.

There were some problems with the newest version of Atape in the last few
months. Sorry I can't be more specific, I just don't remember the details,
other than the porblem version of Atape being rescinded from the ftp site.
Support at the time was advising people revert to the version that did
work well.

The amount of problems you're seeing, I don't see a down side to
reverting.

[RC]



From:
"John D. Schneider" <john.schneider AT COMPUTERCOACHINGCOMMUNITY DOT COM>
To:
ADSM-L AT VM.MARIST DOT EDU
Date:
04/27/2010 02:42 PM
Subject:
Re: [ADSM-L] "Retry Dismount Failure" that won't clear
Sent by:
"ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>



Robert,
   That's a good thing to bring up, but Removable Storage Manager is
Disabled.

   We are emulating an IBM3584 library, with 64 virtual LTO1 tape
drives.  We use the IBMTape drivers on Windows, and Atape on AIX.  We
just upgraded the drivers a few weeks ago.  I guess it is conceivable
that we are hitting something caused by the new drivers.

   Incidentally, the "Retry Dismount Failure" has disappeared for now,
although it will probably come back, since it has happened a few times,
although it is always weeks or months in between, so it is tough to nail
down the exact circumstance that caused it.  Using the advice from an
old ADSM-L post, I:

1) deleted the paths to the Lan-free clients,
2) deleted the path to the library master
3) deleted the drive.

Then, when I did a "q mount", it would still show "Retry Dismount
Failure", but it wouldn't show what drive!  Instead of:

ANR8380I LTO volume V50317 is mounted R/O in drive EPC-LTO1-006
(/dev/epc-lto1-006), status: RETRY DISMOUNT FAILURE.

it would say:

ANR8380I LTO volume V50317 is mounted R/O, status: RETRY DISMOUNT
FAILURE.

Strange.  Then stranger still, about 15 minutes later, the library
master instance crashed.  We brought it back up, and all the "Retry
Dismount Failure"s were gone!

I should be happy, but I'm not, for two reasons.  First, this is bound
to come up again.  And second, the virtual tape device is still screwed
up.  It must have a SCSI reserve still set.  When I try to configure a
path for it now, it gives me:

ANR8420E DEFINE PATH: An I/O error occurred while accessing drive
EPC-LTO1-006.

How do you clear the SCSI reserve in a virtual tape drive?  I may have
to reboot the whole EDL to do it.


Best Regards,

John D. Schneider
The Computer Coaching Community, LLC
Office: (314) 635-5424 / Toll Free: (866) 796-9226
Cell: (314) 750-8721



-------- Original Message --------
Subject: Re: [ADSM-L] "Retry Dismount Failure" that won't clear
From: Robert Clark <robert.clark7 AT USBANK DOT COM>
Date: Tue, April 27, 2010 12:25 pm
To: ADSM-L AT VM.MARIST DOT EDU

I would make sure RSM on the Windows host is not grabbing the tape
drives.

What type of library is being emulated?

[RC]



From:
"John D. Schneider" <john.schneider AT COMPUTERCOACHINGCOMMUNITY DOT COM>
To:
ADSM-L AT VM.MARIST DOT EDU
Date:
04/27/2010 08:25 AM
Subject:
[ADSM-L] "Retry Dismount Failure" that won't clear
Sent by:
"ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>



Greetings,
 I have been through the archives for help with this one, but I still
don't have an answer.
 I support a TSM 5.4.3.0 server running on AIX 5.3ML9. EMC Disk
Library for virtual tape, configured as 64 LTO1 tape drives. This
server is the library master for both AIX and Windows Lan-free clients
running the 5.4.2.0 Lan-free storage agent.

 We came in yesterday and found 5 virtual tapes mounted, but in
"Retry Dismount Failure" state:

ANR8380I LTO volume V50135 is mounted R/O in drive EPC-LTO1-025
(/dev/epc-lto1-025), status: RETRY DISMOUNT FAILURE.
ANR8380I LTO volume V50128 is mounted R/O in drive EPC-LTO1-040
(/dev/epc-lto1-040), status: RETRY DISMOUNT FAILURE.
ANR8380I LTO volume V50097 is mounted R/O in drive EPC-LTO1-044
(/dev/epc-lto1-044), status: RETRY DISMOUNT FAILURE.
ANR8380I LTO volume V50129 is mounted R/O in drive EPC-LTO1-047
(/dev/epc-lto1-047), status: RETRY DISMOUNT FAILURE..
ANR8380I LTO volume V50317 is mounted R/O in drive EPC-LTO1-006
(/dev/epc-lto1-006), status: RETRY DISMOUNT FAILURE.

They have been in this state over 24 hours now, and we can't clear them.

We can tell this is problem caused because of a confusion between the
Library master and one of the Lan-free agents. My surmise is that the
lan-free agent thinks it is finished with the drives, but that message
never gets to the TSM server. Later the TSM Server's timeout logic
tries to reclaim the drive, but the lan-free server still has a SCSI
reserve on the tape drive, so the TSM Server can't open it to talk to
it.

We went out to the EDL appliance and dismounted the virtual tapes from
the drives, so they are empty. We have tried restarting both the TSM
server software and Lan-free agent. We have rebooted the Windows server
running the Lan-free agent. We have deleted and rediscovered the AIX
rmt devices on the library master. All those worked fine. We did an
'update server STL-PVMCONBKP02 forcsync=yes' between the server an TSM
server and the lan-free agent, but that didn't help.

The 'Retry Dismount Failure' errors still persist. Every little while
we still get the following messages in the server activity log. Since
the session between the server and the lan-free agent STL-PVMCONBKP02
isn't getting any errors, it is not a simple communication problem
between them.

04/27/10 08:16:51 ANR0408I Session 11595 started for server
STL-PVMCONBKP02
 (Windows) (Tcp/Ip) for library sharing.
(SESSION: 11595)
04/27/10 08:16:51 ANR0408I Session 11596 started for server
STL-PVMCONBKP02
 (Windows) (Tcp/Ip) for library sharing.
(SESSION: 11596)
04/27/10 08:16:51 ANR0408I Session 11597 started for server
STL-PVMCONBKP02
 (Windows) (Tcp/Ip) for library sharing.
(SESSION: 11597)
04/27/10 08:16:51 ANR0408I Session 11598 started for server
STL-PVMCONBKP02
 (Windows) (Tcp/Ip) for library sharing.
(SESSION: 11598)
04/27/10 08:16:51 ANR0408I Session 11599 started for server
STL-PVMCONBKP02
 (Windows) (Tcp/Ip) for library sharing.
(SESSION: 11599)
04/27/10 08:16:51 ANR0409I Session 11595 ended for server
STL-PVMCONBKP02
 (Windows). (SESSION: 11595)

04/27/10 08:16:51 ANR0409I Session 11596 ended for server
STL-PVMCONBKP02
 (Windows). (SESSION: 11596)

04/27/10 08:16:51 ANR0409I Session 11597 ended for server
STL-PVMCONBKP02
 (Windows). (SESSION: 11597)

04/27/10 08:16:51 ANR0409I Session 11598 ended for server
STL-PVMCONBKP02
 (Windows). (SESSION: 11598)

04/27/10 08:16:51 ANR0409I Session 11599 ended for server
STL-PVMCONBKP02
 (Windows). (SESSION: 11599)

04/27/10 08:16:51 ANR1794W TSM SAN discovery is disabled by options.

 (SESSION: 11595)

04/27/10 08:16:51 ANR8965W The server is unable to automatically
determine
 the serial number for the device. (SESSION:
11595)
04/27/10 08:16:51 ANR8779E Unable to open drive /dev/epc-lto1-044,
error
 number=16. (SESSION: 11595)

04/27/10 08:16:51 ANR1794W TSM SAN discovery is disabled by options.

 (SESSION: 11598)

04/27/10 08:16:51 ANR8965W The server is unable to automatically
determine
 the serial number for the device. (SESSION:
11598)
04/27/10 08:16:51 ANR8779E Unable to open drive /dev/epc-lto1-040,
error
 number=16. (SESSION: 11598)

04/27/10 08:16:51 ANR1794W TSM SAN discovery is disabled by options.

 (SESSION: 11596)

04/27/10 08:16:51 ANR8965W The server is unable to automatically
determine
 the serial number for the device. (SESSION:
11596)
04/27/10 08:16:51 ANR8779E Unable to open drive /dev/epc-lto1-006,
error
 number=16. (SESSION: 11596)

04/27/10 08:16:51 ANR1794W TSM SAN discovery is disabled by options.

 (SESSION: 11599)

04/27/10 08:16:51 ANR8965W The server is unable to automatically
determine
 the serial number for the device. (SESSION:
11599)
04/27/10 08:16:51 ANR8779E Unable to open drive /dev/epc-lto1-047,
error
 number=16. (SESSION: 11599)

04/27/10 08:16:51 ANR1794W TSM SAN discovery is disabled by options.

 (SESSION: 11597)

04/27/10 08:16:51 ANR8965W The server is unable to automatically
determine
 the serial number for the device. (SESSION:
11597)
04/27/10 08:16:51 ANR8779E Unable to open drive /dev/epc-lto1-025,
error
 number=16. (SESSION: 11597)


We have seen this before, but rarely, and in the past we were always
able to clear it by restarting the Lan-free agent and the TSM server
software. But this time that isn't working.

FYI, every once in a while, the 'Retry Dismount Failure' will change to
'Dismounting' for a few seconds, then goes back to 'Retry Dismount
Failure' again. So TSM is obviously trying to do something to clear it.

Can anyone suggest a procedure for clearing this condition?

Best Regards,

John D. Schneider
The Computer Coaching Community, LLC
Office: (314) 635-5424 / Toll Free: (866) 796-9226
Cell: (314) 750-8721



U.S. BANCORP made the following annotations
---------------------------------------------------------------------
Electronic Privacy Notice. This e-mail, and any attachments, contains
information that is, or may be, covered by electronic communications
privacy laws, and is also confidential and proprietary in nature. If you
are not the intended recipient, please be advised that you are legally
prohibited from retaining, using, copying, distributing, or otherwise
disclosing this information in any manner. Instead, please reply to the
sender that you have received this communication in error, and then
immediately delete it. Thank you in advance for your cooperation.



---------------------------------------------------------------------



U.S. BANCORP made the following annotations
---------------------------------------------------------------------
Electronic Privacy Notice. This e-mail, and any attachments, contains 
information that is, or may be, covered by electronic communications privacy 
laws, and is also confidential and proprietary in nature. If you are not the 
intended recipient, please be advised that you are legally prohibited from 
retaining, using, copying, distributing, or otherwise disclosing this 
information in any manner. Instead, please reply to the sender that you have 
received this communication in error, and then immediately delete it. Thank you 
in advance for your cooperation.



---------------------------------------------------------------------