Path being taken offline

cwilloug

ADSM.ORG Senior Member
Joined
Sep 13, 2006
Messages
388
Reaction score
11
Points
0
Location
North Dakota
Website
Visit site
I have 2 IBM N3600 NAS, each with 6 paths via a switch zoned to tape drives in my TS3500 library, backing up with NDMP. Things have been running smoothly since I figured out the NDMP backup configs, until yesterday. Yesterday morning I came into work and found that one of the paths to the library drives had been taken offline. I checked the connections to the NAS, Switch, and library - all ok, so I used the ISC to place the path back on-line.

This morning I came into work, and the same path, same drive, was offline, a quick search of the actlog found......

06/16/2009 19:05:19 ANR8471E Server no longer polling drive DRIVE08 in library
3500LIB - path /dev/rmt13 will be marked off-line.
(SESSION: 61712, PROCESS: 481)
06/16/2009 19:05:19 ANR8873E The path from source N3600FS2 to destination
DRIVE08 (/dev/rmt13) is taken offline. (SESSION: 61712,
PROCESS: 481)

Then a look at the errpt on my TSM AIX box found.....

IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
476B351D 0616190309 P H rmt13 TAPE DRIVE FAILURE
A7AB4C8F 0616190309 I H rmt13 TAPE SIM/MIM RECORD
476B351D 0616190309 P H rmt13 TAPE DRIVE FAILURE
A7AB4C8F 0616190309 I H rmt13 TAPE SIM/MIM RECORD
476B351D 0616183309 P H rmt13 TAPE DRIVE FAILURE
476B351D 0616183309 P H rmt13 TAPE DRIVE FAILURE
A7AB4C8F 0616052909 I H rmt11 TAPE SIM/MIM RECORD
A7AB4C8F 0616052509 I H rmt11 TAPE SIM/MIM RECORD
E507DCF9 0616052509 I H rmt11 TAPE DRIVE NEEDS CLEANING
A7AB4C8F 0616015709 I H rmt17 TAPE SIM/MIM RECORD

Wow, a tape drive failure,,, hummmm.. but if there was a tape drive failure, why was only the path, and not the drive taken off-line?

tsm: TSM>q path

Source Name Source Type Destination Destination On-Line
Name Type
----------- ----------- ----------- ----------- -------
TSM SERVER 3500LIB LIBRARY Yes
TSM SERVER DRIVE01 DRIVE Yes
TSM SERVER DRIVE02 DRIVE Yes
TSM SERVER DRIVE03 DRIVE Yes
TSM SERVER DRIVE04 DRIVE Yes
TSM SERVER DRIVE05 DRIVE Yes
TSM SERVER DRIVE06 DRIVE Yes
TSM SERVER DRIVE07 DRIVE Yes
TSM SERVER DRIVE08 DRIVE Yes
N3600FS2 DATAMOVER DRIVE02 DRIVE Yes
N3600FS2 DATAMOVER DRIVE03 DRIVE Yes
N3600FS2 DATAMOVER DRIVE04 DRIVE Yes
N3600FS2 DATAMOVER DRIVE06 DRIVE Yes
N3600FS2 DATAMOVER DRIVE07 DRIVE Yes
N3600FS2 DATAMOVER DRIVE08 DRIVE No
N3600_FS1 DATAMOVER DRIVE01 DRIVE Yes
N3600_FS1 DATAMOVER DRIVE02 DRIVE Yes
N3600_FS1 DATAMOVER DRIVE03 DRIVE Yes
N3600_FS1 DATAMOVER DRIVE05 DRIVE Yes
N3600_FS1 DATAMOVER DRIVE06 DRIVE Yes
N3600_FS1 DATAMOVER DRIVE07 DRIVE Yes

tsm: TSM>q drive

Library Name Drive Name Device Type On-Line
------------ ------------ ----------- -------------------
3500LIB DRIVE01 3592 Yes
3500LIB DRIVE02 3592 Yes
3500LIB DRIVE03 3592 Yes
3500LIB DRIVE04 3592 Yes
3500LIB DRIVE05 3592 Yes
3500LIB DRIVE06 3592 Yes
3500LIB DRIVE07 3592 Yes
3500LIB DRIVE08 3592 Yes

Looking at the Library drive errors, I did find an error on Drive08 with a tape that is not used for the NAS Storage Pool at the time the path went off-line.

So why would an error, with a tape not in the NAS pool, cause the NAS drive path to go off-line, and not the drive?

Anyone?
 
I had exactly the same problem a week ago with our TS3500 and LTO3 drives after I upgraded the drive microcodes. I could put the problem drive online but as soon it was accessed by tsm, it was set to offline again.

I helped myself with power cycling the problem drive.
 
Humm,,, I'll try that later today, but the drive was used off and on yesterday - after I brought the path back online, but the drive itself stayed online, the drive never went off-line, just the datamover path.
 
Humm,,, I'll try that later today, but the drive was used off and on yesterday - after I brought the path back online, but the drive itself stayed online, the drive never went off-line, just the datamover path.

Sorry, I meant the path, not the drive ;)
 
Such problems seem to occur regularly with TS3500, would be interesting to know how you can avoid that. I think I would go nuts if I found the drive offline in the morning. I am not a calm guy and things like this make me really angry.
 
Humm,,, I'll try that later today, but the drive was used off and on yesterday - after I brought the path back online, but the drive itself stayed online, the drive never went off-line, just the datamover path.

This could indicates a bad drive or a drive about to fail. TSM normally took the path offline then the drive offline. Power cycle seems to help for a short duration but you still needs to put the path back online after power cycling. Additionally, make certain all the drives are on the same firmware.

I experienced this some time back, when I first started w/TSM and it drove me nuts, and basically it was a bad drive and I contacted our vendor support to replace the drive.


Mike
 
Turned out to be a bad drive,,, with a tape stuck in the drive too. Funny behavior, in the past when a tape has be stuck in any drive, the drive goes offline, not the path.
 
hi,

I have almost the same problem, I have TSM 5.5.3.0 on linux 64bit,
my problem is that after I have installed qlogic lib files, he start take my path of line with this error :

07/12/2009 06:00:06 ANR8963E Unable to find path to match the serial number
defined for drive LTO4A in library 3584LIB . (SESSION:
874, PROCESS: 13)
07/12/2009 06:00:06 ANR8873E The path from source ARES to destination LTO4A
(/dev/IBMtape0) is taken offline. (SESSION: 874, PROCESS:
13)

but the serial if ok i checked it over and over again.
some one have a solution to this matter?
 
I had the a similar problem last week after patching TSM from 5.5.2 to 5.5.3 on Red Hat Linux 5.3 64bit.
It would disable the path to the drives, but the drives would still be listed as online. Also in the actlog TSM would complain about being unable to open the /etc/hba.conf file. This file is used for fibre libraries. I have a TS3310 scsi library. I ended up re-installing TSM 5.2.2 and recovering from my previous backup. Everything works fine now. So I think there is a bug in the 5.5.3 patch, that only looks for fibre drives.
 
Hi Lweron,
Thanks for the reply,
It’s seems to be that we have the same issue with the TSM.
In my case I can’t reinstall my TSM Server so I have opened a PMR in IBM for this issue.
This is one of the errors that I got in the "act log" for one path (I get it to the entire 6 path that I have in my TSM).
ANR8873E The path from source ARES to destination LTO4C
(/dev/IBMtape2) is taken offline. (SESSION: 2016,
PROCESS: 17)
About the hba.conf I have configured it with the latest hba lib files.
And no luck.. still happen :-(
 
Did Ibm provide any solutions for this problem, let us know if they do. Thanks.
 
lweron - 2 weeks ago I posted...

Turned out to be a bad drive,,, with a tape stuck in the drive too. Funny behavior, in the past when a tape has be stuck in any drive, the drive goes offline, not the path.

so I guess the IBM provided solution was to replace the bad drive, and now the earth is once again safe for all humans :)


.....and their data
 
Hi,
Not yet but I have made a trace log for the system and sent it to them to analyze it.

when they will provide a solution I will update this Thread.

:)
 
A7ab4c8f 3592_x tape sim/mim record

Dear All,

I have the same problem but no drives offline.
Yesterday i upgrade TSM 5.5.2 to 5.5.3
On aix i have the message :
LABEL: TAPE_SIM_MIM_RECORD
IDENTIFIER: A7AB4C8F
Date/Time: Thu Jul 30 09:25:40 CEST 2009
Sequence Number: 674
Machine Id: 00CE69834C00
Node Id: xxxxxxxxxx
Class: H
Type: INFO
Resource Name: 3592_6_1
Resource Class: tape
Resource Type: 3592
Location: U7311.D20.653071C-P1-C06-T1-W500507630F986C06-L0
VPD:
Manufacturer................IBM
Machine Type and Model......03592E05
Serial Number...............000007835690
Device Specific.(FW)........1DD1
Loadable Microcode Level....A1700D5C
Description
AAA0
Probable Causes
TAPE DRIVE
MEDIA
Failure Causes
TAPE DRIVE
MEDIA

Have you an idea :confused:
 
Sorry I dont have an idea where it came from.
Im still waiting for IBM to solve this issue.
I have open PMR for that issue.
 
I just received a call of IBM for this PMR, this person said is not a problem of software maybe hardware. He prupose to upgrade all micro-code. I have :
Atape 11.2.9.0 purpose to 11.6.0.0
Drive 3592 1DD1 no purpose i do check
Frame 7379 no purpose i check.

Have you an other solution!

Best regards.
 
Hi Samuel,

I have checked your link and I have the latest firmware version on the tapes
LTO Ultrium 4 Fibre drive firmware
(Downloads available at end of page)
- 94D4L4F.zip (contains 94D4L4F.ro)
- 94D4L4F.tar (contains 94D4L4F.ro)
 
Back
Top