ADSM-L

Re: [ADSM-L] "Retry Dismount Failure" that won't clear

2010-04-27 18:43:00
Subject: Re: [ADSM-L] "Retry Dismount Failure" that won't clear
From: "John D. Schneider" <john.schneider AT COMPUTERCOACHINGCOMMUNITY DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Tue, 27 Apr 2010 15:41:55 -0700
Richard,
     Thanks for your reply.  Here is my tape library:

tsm: EPIC-TSM00>q libr f=d

                  Library Name: CDL-EPC-LIB1
                  Library Type: SCSI
                        ACS Id:
              Private Category:
              Scratch Category:
         WORM Scratch Category:
              External Manager:
                        Shared: Yes
                       LanFree:
            ObeyMountRetention:
       Primary Library Manager:
                           WWN: 2003000D77FDE5E0
                 Serial Number: 0012205643260401
                     AutoLabel: No
                  Reset Drives: Yes
Last Update by (administrator): SCHNJD
         Last Update Date/Time: 03/09/2010 19:16:35


So Reset Drives is yes, which ought to be clearing this, shouldn't it? 
By the way, I was wrong that the problem went away.  The first time the
virtual tape drives involved in the problem came up in the rotation,
they went right back into "Retry Dismount Failure" mode.  So the problem
persists.  So then I restarted the Library master instance again, and
then they went right back into "Retry Dismount Failure" mode again.

Tomorrow, I am going to delete all the paths and drives for these 5
drives, and redefine them (if I can).  Then if that doesn't fix it, I
will post again.  

What a pain this has become!

Best Regards,

John D. Schneider
The Computer Coaching Community, LLC
Office: (314) 635-5424 / Toll Free: (866) 796-9226
Cell: (314) 750-8721



-------- Original Message --------
Subject: Re: [ADSM-L] "Retry Dismount Failure" that won't clear
From: "Cowen, Richard" <rcowen AT SBSPLANET DOT COM>
Date: Tue, April 27, 2010 5:30 pm
To: ADSM-L AT VM.MARIST DOT EDU

What do you have for the library option:

RESETDrives Specifies whether the server performs a target reset when
the server is restarted or when a library client or storage agent
re-connection is established.Note: This parameter only applies to SCSI,
3494, Manual, and ACSLS type libraries.

Yes Specifies that the target reset is to be performed. Yes is the
default for SCSI, 3494, Manual, and ACSLS libraries defined or updated
with SHAREd=Yes.
 
No Specifies that the target reset is not performed. No is the default
for SCSI, 3494, Manual, and ACSLS libraries defined with SHAREd=No.


-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
John D. Schneider
Sent: Tuesday, April 27, 2010 5:42 PM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: [ADSM-L] "Retry Dismount Failure" that won't clear

Robert,
 That's a good thing to bring up, but Removable Storage Manager is
Disabled. 

 We are emulating an IBM3584 library, with 64 virtual LTO1 tape
drives. We use the IBMTape drivers on Windows, and Atape on AIX. We
just upgraded the drivers a few weeks ago. I guess it is conceivable
that we are hitting something caused by the new drivers. 

 Incidentally, the "Retry Dismount Failure" has disappeared for now,
although it will probably come back, since it has happened a few times,
although it is always weeks or months in between, so it is tough to nail
down the exact circumstance that caused it. Using the advice from an
old ADSM-L post, I:

1) deleted the paths to the Lan-free clients, 
2) deleted the path to the library master
3) deleted the drive.

Then, when I did a "q mount", it would still show "Retry Dismount
Failure", but it wouldn't show what drive! Instead of:

ANR8380I LTO volume V50317 is mounted R/O in drive EPC-LTO1-006
(/dev/epc-lto1-006), status: RETRY DISMOUNT FAILURE.

it would say:

ANR8380I LTO volume V50317 is mounted R/O, status: RETRY DISMOUNT
FAILURE.

Strange. Then stranger still, about 15 minutes later, the library
master instance crashed. We brought it back up, and all the "Retry
Dismount Failure"s were gone!

I should be happy, but I'm not, for two reasons. First, this is bound
to come up again. And second, the virtual tape device is still screwed
up. It must have a SCSI reserve still set. When I try to configure a
path for it now, it gives me:

ANR8420E DEFINE PATH: An I/O error occurred while accessing drive
EPC-LTO1-006.

How do you clear the SCSI reserve in a virtual tape drive? I may have
to reboot the whole EDL to do it.


Best Regards,

John D. Schneider
The Computer Coaching Community, LLC
Office: (314) 635-5424 / Toll Free: (866) 796-9226
Cell: (314) 750-8721



-------- Original Message --------
Subject: Re: [ADSM-L] "Retry Dismount Failure" that won't clear
From: Robert Clark <robert.clark7 AT USBANK DOT COM>
Date: Tue, April 27, 2010 12:25 pm
To: ADSM-L AT VM.MARIST DOT EDU

I would make sure RSM on the Windows host is not grabbing the tape
drives.

What type of library is being emulated?

[RC]



From:
"John D. Schneider" <john.schneider AT COMPUTERCOACHINGCOMMUNITY DOT COM>
To:
ADSM-L AT VM.MARIST DOT EDU
Date:
04/27/2010 08:25 AM
Subject:
[ADSM-L] "Retry Dismount Failure" that won't clear
Sent by:
"ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>



Greetings,
 I have been through the archives for help with this one, but I still
don't have an answer.
 I support a TSM 5.4.3.0 server running on AIX 5.3ML9. EMC Disk
Library for virtual tape, configured as 64 LTO1 tape drives. This
server is the library master for both AIX and Windows Lan-free clients
running the 5.4.2.0 Lan-free storage agent.

 We came in yesterday and found 5 virtual tapes mounted, but in
"Retry Dismount Failure" state:

ANR8380I LTO volume V50135 is mounted R/O in drive EPC-LTO1-025
(/dev/epc-lto1-025), status: RETRY DISMOUNT FAILURE.
ANR8380I LTO volume V50128 is mounted R/O in drive EPC-LTO1-040
(/dev/epc-lto1-040), status: RETRY DISMOUNT FAILURE.
ANR8380I LTO volume V50097 is mounted R/O in drive EPC-LTO1-044
(/dev/epc-lto1-044), status: RETRY DISMOUNT FAILURE.
ANR8380I LTO volume V50129 is mounted R/O in drive EPC-LTO1-047
(/dev/epc-lto1-047), status: RETRY DISMOUNT FAILURE..
ANR8380I LTO volume V50317 is mounted R/O in drive EPC-LTO1-006
(/dev/epc-lto1-006), status: RETRY DISMOUNT FAILURE.

They have been in this state over 24 hours now, and we can't clear them.

We can tell this is problem caused because of a confusion between the
Library master and one of the Lan-free agents. My surmise is that the
lan-free agent thinks it is finished with the drives, but that message
never gets to the TSM server. Later the TSM Server's timeout logic
tries to reclaim the drive, but the lan-free server still has a SCSI
reserve on the tape drive, so the TSM Server can't open it to talk to
it.

We went out to the EDL appliance and dismounted the virtual tapes from
the drives, so they are empty. We have tried restarting both the TSM
server software and Lan-free agent. We have rebooted the Windows server
running the Lan-free agent. We have deleted and rediscovered the AIX
rmt devices on the library master. All those worked fine. We did an
'update server STL-PVMCONBKP02 forcsync=yes' between the server an TSM
server and the lan-free agent, but that didn't help.

The 'Retry Dismount Failure' errors still persist. Every little while
we still get the following messages in the server activity log. Since
the session between the server and the lan-free agent STL-PVMCONBKP02
isn't getting any errors, it is not a simple communication problem
between them.

04/27/10 08:16:51 ANR0408I Session 11595 started for server
STL-PVMCONBKP02
 (Windows) (Tcp/Ip) for library sharing.
(SESSION: 11595)
04/27/10 08:16:51 ANR0408I Session 11596 started for server
STL-PVMCONBKP02
 (Windows) (Tcp/Ip) for library sharing.
(SESSION: 11596)
04/27/10 08:16:51 ANR0408I Session 11597 started for server
STL-PVMCONBKP02
 (Windows) (Tcp/Ip) for library sharing.
(SESSION: 11597)
04/27/10 08:16:51 ANR0408I Session 11598 started for server
STL-PVMCONBKP02
 (Windows) (Tcp/Ip) for library sharing.
(SESSION: 11598)
04/27/10 08:16:51 ANR0408I Session 11599 started for server
STL-PVMCONBKP02
 (Windows) (Tcp/Ip) for library sharing.
(SESSION: 11599)
04/27/10 08:16:51 ANR0409I Session 11595 ended for server
STL-PVMCONBKP02
 (Windows). (SESSION: 11595)

04/27/10 08:16:51 ANR0409I Session 11596 ended for server
STL-PVMCONBKP02
 (Windows). (SESSION: 11596)

04/27/10 08:16:51 ANR0409I Session 11597 ended for server
STL-PVMCONBKP02
 (Windows). (SESSION: 11597)

04/27/10 08:16:51 ANR0409I Session 11598 ended for server
STL-PVMCONBKP02
 (Windows). (SESSION: 11598)

04/27/10 08:16:51 ANR0409I Session 11599 ended for server
STL-PVMCONBKP02
 (Windows). (SESSION: 11599)

04/27/10 08:16:51 ANR1794W TSM SAN discovery is disabled by options.

 (SESSION: 11595)

04/27/10 08:16:51 ANR8965W The server is unable to automatically
determine
 the serial number for the device. (SESSION:
11595)
04/27/10 08:16:51 ANR8779E Unable to open drive /dev/epc-lto1-044,
error
 number=16. (SESSION: 11595)

04/27/10 08:16:51 ANR1794W TSM SAN discovery is disabled by options.

 (SESSION: 11598)

04/27/10 08:16:51 ANR8965W The server is unable to automatically
determine
 the serial number for the device. (SESSION:
11598)
04/27/10 08:16:51 ANR8779E Unable to open drive /dev/epc-lto1-040,
error
 number=16. (SESSION: 11598)

04/27/10 08:16:51 ANR1794W TSM SAN discovery is disabled by options.

 (SESSION: 11596)

04/27/10 08:16:51 ANR8965W The server is unable to automatically
determine
 the serial number for the device. (SESSION:
11596)
04/27/10 08:16:51 ANR8779E Unable to open drive /dev/epc-lto1-006,
error
 number=16. (SESSION: 11596)

04/27/10 08:16:51 ANR1794W TSM SAN discovery is disabled by options.

 (SESSION: 11599)

04/27/10 08:16:51 ANR8965W The server is unable to automatically
determine
 the serial number for the device. (SESSION:
11599)
04/27/10 08:16:51 ANR8779E Unable to open drive /dev/epc-lto1-047,
error
 number=16. (SESSION: 11599)

04/27/10 08:16:51 ANR1794W TSM SAN discovery is disabled by options.

 (SESSION: 11597)

04/27/10 08:16:51 ANR8965W The server is unable to automatically
determine
 the serial number for the device. (SESSION:
11597)
04/27/10 08:16:51 ANR8779E Unable to open drive /dev/epc-lto1-025,
error
 number=16. (SESSION: 11597)


We have seen this before, but rarely, and in the past we were always
able to clear it by restarting the Lan-free agent and the TSM server
software. But this time that isn't working.

FYI, every once in a while, the 'Retry Dismount Failure' will change to
'Dismounting' for a few seconds, then goes back to 'Retry Dismount
Failure' again. So TSM is obviously trying to do something to clear it.

Can anyone suggest a procedure for clearing this condition?

Best Regards,

John D. Schneider
The Computer Coaching Community, LLC
Office: (314) 635-5424 / Toll Free: (866) 796-9226
Cell: (314) 750-8721



U.S. BANCORP made the following annotations
---------------------------------------------------------------------
Electronic Privacy Notice. This e-mail, and any attachments, contains
information that is, or may be, covered by electronic communications
privacy laws, and is also confidential and proprietary in nature. If you
are not the intended recipient, please be advised that you are legally
prohibited from retaining, using, copying, distributing, or otherwise
disclosing this information in any manner. Instead, please reply to the
sender that you have received this communication in error, and then
immediately delete it. Thank you in advance for your cooperation.



---------------------------------------------------------------------



The information contained in this transmission may contain privileged
and confidential information. 
It is intended only for the use of the person(s) named above. If you are
not the intended 
recipient, you are hereby notified that any review, dissemination,
distribution or 
duplication of this communication is strictly prohibited. If you are not
the intended recipient, 
please contact the sender by reply email and destroy all copies of the
original message. 
To reply to our email administrator directly, please send an email to
postmaster AT sbsplanet DOT com.