ADSM-L

Re: [ADSM-L] TS3500 firmware/hardware issues

2010-06-29 20:54:42
Subject: Re: [ADSM-L] TS3500 firmware/hardware issues
From: David Longo <David.Longo AT HEALTH-FIRST DOT ORG>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Tue, 29 Jun 2010 20:53:37 -0400
Wow, sounds like fun!

As I read this, it seems like it is probably not possible to
do normal troubleshooting - what happened in sequence
when this stopped working.  So, you just have to dive in there.

Here's some quick thoughts.

1.  Does the 3584 have call home setup to new vendor and did
they get any calls?  Do you get error messages on the LCD
panel of the 3584?  Does library work most of the time with
occasional errors or regular errors or basically not work at all?

If you don't have Control Paths, the library will be bad.  Was
Control path on a drive that was changed?  Move to a drive
that wasn't changed and seems to be working  normally.

2.  I assume these are Fibre drives.  Even with experienced
IBM CE's, like I have had, sometimes changing a drive, isn't "clean".
If nor done just right, the WWN doesn't come up with the original one
and they need t do it again.  If not then TSM does not see drive
due to WWN doesn't match original drive.  

Do you have access to the SAN switches to look at that or good 
relationship with SAN team?  Investigate there.

3.  You said LPAR so, I guess you have AIX platform for 2 AIX/TSM servers?
Look at AIX level as to what you have as far as drives and config info.
Do you know what it was before problems started?  Also use "tapeutil"
to help troubleshoot issues.
--------------------------------------------------
4.  A fairly quick suggestion, but use with "caveat".  Make sure you have
reasonable config listings before starting.  I would do in this order:

A.  Power off/on 3584 library to reset.  Does it come up o.k.?

B.  Reboot each TSM server, in which ever order is appropriate.

C.  Look at AIX errpt for each server after reboot and see it useful error
messages or none show up.  Then look at TSM actlog to see it's messages or
restart.

D.  Examine carefully and compare WWN's of drives per the 3584 Web
Interface  or LCD, the SAN switches they are connected to and AIX and TSM
indications.  Also your SAN zoning.  Does it all line up?

None of this will fix any robot issues(except some chance that 3584 reboot may
 help), but can clear up others.

Last resort - Call IBM and get your checkbook out.

David Longo

Sometimes just A. and B. can solve some issues, you seem to have many.

>>> "Gill, Geoffrey L." <GEOFFREY.L.GILL AT SAIC DOT COM> 6/29/2010 7:44 PM >>>
I've been having numerous problems lately with our 3584 and wondered if
those of you out there with  one would mind sharing your firmware level.
Ours is on 7270, with LTO2 drives at 73V1. I've been on this for some
time and don't believe it is related to the issues we're having but you
never know I guess.

 

I believe the issues I'm having are related to the company performing
maintenance these days. I say that because when IBM was on the hook I
never had an issue that wasn't resolved the same day they were called
and it never took more than one visit to get anything fixed, never.
These guys break cables and say they didn't do it, replace drives like
I've never seen before just because a tape was stuck in one, replace
drives with broken drives, replaced the wrong drive, and at the moment,
going on 2 weeks now, can't get the robot to work on either LPAR for
more than a few minutes. I'm disgusted with the support so I'm looking
for info from anyone who may have had similar robot issues in the past.
It's been difficult to say the least now that I don't have a single
source to troubleshoot these issues.

 

I get these on both systems, drives go offline, paths go offline,
nothing mounts, the robot can't find tapes, it can't dismount tapes from
drives. Seperate control path to a completely different set of drives.
I'm looking for anything anyone can pass on that I can take to these
guys to see if they've though about or replaced certain parts.

 

I can tell you one gripper went out about 3 weeks about and ever since
then the library has been a mess. They've replaced grippers and other
parts in the robot itself, all of which I don't have a list. When using
the web GUI to run an inventory I've gotten errors, y motor won't move,
x motion failure, excessive drift grippers errors when nothing was in
them, accessor degraded, among other things. Bottom line to me is I
never had these problems till they touched the robot and since I have no
training on that unit I was wondering if anyone out there may  have had
some interaction with a tech that knows what he's doing that might help
give some info that would help me communicate something worthwhile to
these guys. Any advice, besides the obvious of get IBM to fix it, is
welcome. I already tried that and didn't get anywhere.

 

TSM errors I believe are all related to the library issue and I don't
believe that either LPAR truly has an issue since they have separate
hardware, paths and drives. The only unit they have in common is the
robot itself.

 

6/29/2010 3:02:34 PM ANR8840E Unable to open device /dev/smc0 with file
handle 11.

6/29/2010 3:02:34 PM ANR8469E Dismount of LTO volume T00560 from drive
LTO_4 (/dev/rmt3) in library 3584LIB failed.

6/29/2010 3:04:54 PM ANR8840E Unable to open device /dev/smc0 with file
handle 11.

6/29/2010 3:04:54 PM ANR8469E Dismount of LTO volume T02343 from drive
LTO_8 (/dev/rmt7) in library 3584LIB failed.

6/29/2010 3:07:14 PM ANR8840E Unable to open device /dev/smc0 with file
handle 11.

6/29/2010 3:07:14 PM ANR8469E Dismount of LTO volume T01268 from drive
LTO_6 (/dev/rmt5) in library 3584LIB failed.

6/29/2010 3:09:34 PM ANR8840E Unable to open device /dev/smc0 with file
handle 11.

6/29/2010 3:09:34 PM ANR8469E Dismount of LTO volume T01037 from drive
LTO_10 (/dev/rmt9) in library 3584LIB failed.

6/29/2010 3:11:54 PM ANR8840E Unable to open device /dev/smc0 with file
handle 11.

6/29/2010 3:11:54 PM ANR8469E Dismount of LTO volume T00458 from drive
LTO_1 (/dev/rmt0) in library 3584LIB failed.

6/29/2010 3:14:14 PM ANR8840E Unable to open device /dev/smc0 with file
handle 11.

6/29/2010 3:14:14 PM ANR8469E Dismount of LTO volume T00543 from drive
LTO_11 (/dev/rmt10) in library 3584LIB failed.

6/29/2010 3:16:35 PM ANR8840E Unable to open device /dev/smc0 with file
handle 11.

6/29/2010 3:16:35 PM ANR8469E Dismount of LTO volume T00881 from drive
LTO_5 (/dev/rmt4) in library 3584LIB failed.

6/29/2010 3:18:55 PM ANR8840E Unable to open device /dev/smc0 with file
handle 11.

6/29/2010 3:18:55 PM ANR8469E Dismount of LTO volume T00560 from drive
LTO_4 (/dev/rmt3) in library 3584LIB failed.

6/29/2010 3:21:15 PM ANR8840E Unable to open device /dev/smc0 with file
handle 11.

6/29/2010 3:21:15 PM ANR8469E Dismount of LTO volume T02343 from drive
LTO_8 (/dev/rmt7) in library 3584LIB failed.

6/29/2010 3:23:35 PM ANR8840E Unable to open device /dev/smc0 with file
handle 11.

6/29/2010 3:23:35 PM ANR8469E Dismount of LTO volume T01268 from drive
LTO_6 (/dev/rmt5) in library 3584LIB failed.

6/29/2010 3:25:55 PM ANR8840E Unable to open device /dev/smc0 with file
handle 11.

6/29/2010 3:25:55 PM ANR8469E Dismount of LTO volume T01037 from drive
LTO_10 (/dev/rmt9) in library 3584LIB failed.

6/29/2010 3:28:15 PM ANR8840E Unable to open device /dev/smc0 with file
handle 11.

6/29/2010 3:28:15 PM ANR8469E Dismount of LTO volume T00458 from drive
LTO_1 (/dev/rmt0) in library 3584LIB failed.

6/29/2010 3:30:35 PM ANR8840E Unable to open device /dev/smc0 with file
handle 11.

6/29/2010 3:30:35 PM ANR8469E Dismount of LTO volume T00543 from drive
LTO_11 (/dev/rmt10) in library 3584LIB failed.

6/29/2010 3:32:55 PM ANR8840E Unable to open device /dev/smc0 with file
handle 11.

6/29/2010 3:32:55 PM ANR8469E Dismount of LTO volume T00881 from drive
LTO_5 (/dev/rmt4) in library 3584LIB failed.

6/29/2010 3:35:15 PM ANR8840E Unable to open device /dev/smc0 with file
handle 11.

6/29/2010 3:35:15 PM ANR8469E Dismount of LTO volume T00560 from drive
LTO_4 (/dev/rmt3) in library 3584LIB failed.

6/29/2010 3:37:35 PM ANR8840E Unable to open device /dev/smc0 with file
handle 11.

6/29/2010 3:39:55 PM ANR8840E Unable to open device /dev/smc0 with file
handle 11.

6/29/2010 3:42:15 PM ANR8840E Unable to open device /dev/smc0 with file
handle 11.

6/29/2010 3:44:35 PM ANR8840E Unable to open device /dev/smc0 with file
handle 11.

6/29/2010 3:46:55 PM ANR8840E Unable to open device /dev/smc0 with file
handle 11.

6/29/2010 3:49:15 PM ANR8840E Unable to open device /dev/smc0 with file
handle 11.

6/29/2010 3:51:35 PM ANR8840E Unable to open device /dev/smc0 with file
handle 11.

6/29/2010 3:53:55 PM ANR8840E Unable to open device /dev/smc0 with file
handle 11.

6/29/2010 3:53:55 PM ANR8848W Drive LTO_8 of library 3584LIB is
inaccessible; server has begun polling drive.

6/29/2010 3:53:55 PM ANR8485E No drives are available to be mounted in
R/W mode with format 00000008 in library 3584LIB.

6/29/2010 3:53:55 PM ANR1401W Mount request denied for volume T01618 -
mount failed.

6/29/2010 3:56:15 PM ANR8840E Unable to open device /dev/smc0 with file
handle 11.

6/29/2010 3:56:15 PM ANR8848W Drive LTO_6 of library 3584LIB is
inaccessible; server has begun polling drive.

6/29/2010 3:56:15 PM ANR8485E No drives are available to be mounted in
R/W mode with format 00000008 in library 3584LIB.

6/29/2010 3:56:15 PM ANR1401W Mount request denied for volume T01874 -
mount failed.

6/29/2010 3:58:35 PM ANR8840E Unable to open device /dev/smc0 with file
handle 11.

6/29/2010 3:58:35 PM ANR8848W Drive LTO_10 of library 3584LIB is
inaccessible; server has begun polling drive.

6/29/2010 3:58:35 PM ANR8485E No drives are available to be mounted in
R/W mode with format 00000008 in library 3584LIB.

6/29/2010 3:58:35 PM ANR1401W Mount request denied for volume T01604 -
mount failed.

6/29/2010 4:00:55 PM ANR8840E Unable to open device /dev/smc0 with file
handle 11.

6/29/2010 4:00:55 PM ANR8848W Drive LTO_1 of library 3584LIB is
inaccessible; server has begun polling drive.

6/29/2010 4:03:15 PM ANR8840E Unable to open device /dev/smc0 with file
handle 11.

6/29/2010 4:03:15 PM ANR8848W Drive LTO_12 of library 3584LIB is
inaccessible; server has begun polling drive.

6/29/2010 4:05:35 PM ANR8840E Unable to open device /dev/smc0 with file
handle 11.

6/29/2010 4:05:35 PM ANR8848W Drive LTO_13 of library 3584LIB is
inaccessible; server has begun polling drive.

6/29/2010 4:07:55 PM ANR8840E Unable to open device /dev/smc0 with file
handle 11.

6/29/2010 4:07:55 PM ANR8848W Drive LTO_14 of library 3584LIB is
inaccessible; server has begun polling drive.

6/29/2010 4:10:15 PM ANR8840E Unable to open device /dev/smc0 with file
handle 11.

6/29/2010 4:10:15 PM ANR8848W Drive LTO_11 of library 3584LIB is
inaccessible; server has begun polling drive.

6/29/2010 4:12:35 PM ANR8840E Unable to open device /dev/smc0 with file
handle 11.

6/29/2010 4:12:35 PM ANR8848W Drive LTO_5 of library 3584LIB is
inaccessible; server has begun polling drive.

6/29/2010 4:14:55 PM ANR8840E Unable to open device /dev/smc0 with file
handle 11.

6/29/2010 4:14:55 PM ANR8848W Drive LTO_4 of library 3584LIB is
inaccessible; server has begun polling drive.

6/29/2010 4:17:15 PM ANR8840E Unable to open device /dev/smc0 with file
handle 11.

6/29/2010 4:17:15 PM ANR8848W Drive LTO_3 of library 3584LIB is
inaccessible; server has begun polling drive.

6/29/2010 4:19:35 PM ANR8840E Unable to open device /dev/smc0 with file
handle 11.

6/29/2010 4:19:35 PM ANR8848W Drive LTO_9 of library 3584LIB is
inaccessible; server has begun polling drive.

 

 

Geoff Gill 
TSM Administrator 

SAIC M/S-B1P 

4224 Campus Pt. Ct.

San Diego, CA  92121
(858)826-4062 (office)

(858)412-9883 (blackberry)

 


#####################################
This message is for the named person's use only.  It may 
contain private, proprietary, or legally privileged information.  
No privilege is waived or lost by any mistransmission.  If you 
receive this message in error, please immediately delete it and 
all copies of it from your system, destroy any hard copies of it, 
and notify the sender.  You must not, directly or indirectly, use, 
disclose, distribute, print, or copy any part of this message if you 
are not the intended recipient.  Health First reserves the right to 
monitor all e-mail communications through its networks.  Any views 
or opinions expressed in this message are solely those of the 
individual sender, except (1) where the message states such views 
or opinions are on behalf of a particular entity;  and (2) the sender 
is authorized by the entity to give such views or opinions.
#####################################

<Prev in Thread] Current Thread [Next in Thread>