ADSM-L

Re: [ADSM-L] looking for experiences with the ibm 3584 library

2014-05-05 11:08:14
Subject: Re: [ADSM-L] looking for experiences with the ibm 3584 library
From: "Prather, Wanda" <Wanda.Prather AT ICFI DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Mon, 5 May 2014 15:06:18 +0000
7) Atape - the mysterious of who within IBM owns it!

We all use atape on our hosts for the tape lib/drive driver.
If you ever suspect/have a problem with it, you will get nowhere in trying to 
get support from IBM.  Open a case on the 3584?  Nope, we don't support that - 
it's host software.  Open a case with AIX?
Nope, that's not a AIX piece of sftw.  Open a case with TSM support?
Nope, they have nothing to do with it.

>>Richard, if you have that much of a problem, I'd suggest you get your IBM rep 
>>or business partner/vendor involved.
When you have issues that span hardware/software, they can open a "crit sit" 
and get everybody from hardware &software on the phone *at the same time* so 
that this kind of issue can get resolved.  


 



-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of 
Rhodes, Richard L.
Sent: Monday, May 05, 2014 9:09 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: [ADSM-L] looking for experiences with the ibm 3584 library

>We have been using a 3584 for about 12 years and have had no issues at 
>all with it. The only time it has been "down"
>is for firmware upgrades, replacing tape drives (upgrade from
>LTO2 to LTO4), and when we moved to our new datacenter. Very stable and 
>a great workhorse.

I generally agree with this. We love our 3584's (we have two).
They have been very good workhorses.  
BUT,  we have gone through some very frustrating problems with them!

1) The case of the frayed ribbon cable.

One of the libraries had the ribbon cable that connects the library proper to 
the robot fray, which caused a short, which took out several cards.  It took 
IBM well over 30hr to resolve.  
I think we had 3 or 4 IBM'ers onsite trying to figure this problem out.  They 
wouldn't just order a bunch of parts.
They insisted in ordering parts one at a time as they decided to replace them.  
The parts are all far away, causing many, many hours of waiting.  

2)  The case of the mysterious gripper failures.

The robot would get stuck with the a tape suspended between the robot gripper 
and the drive mouth.  The tape cartridge pinned the robot. Both libraries were 
doing this. 
It got so bad the library would fail several times per day.
Many grippers were exchanged, it would work well for a while, then go back to 
failing.  Long story short.  The cartridge slots that line the walls of the 
library, as cartridges were inserted/removed, caused a powder (a light dust) to 
get on everything in the library, causing gripper failure.  
IBM had to replace all the plastic slot things in both libraries.  
This finally resolved this problem.

3)  The case of the slow console

Others have said this. There are certain options where it can go away for what 
seems like forever.  One thing I do once in a while is removing old cleaning 
cartridges.  If I get on auto-pilot and start hitting the menu items without 
thinking, I will inevitably hit this one item that requests something about all 
cartridges . . . .it goes away for what seems like forever getting that list.

4)  The case of the Web console weirdness

The web console is simple to use and generally is great, but some functions 
simply don't work well.  For example, requesting a tape to be moved to a 
specific element address may or may not work.  We've never been able to figure 
out why it works some times, and not others.  

Drive firmware upgrade can do flaky things.  We have 50 drives in each 3584.  
When I've performed a drive firmware upgrade on all drive, I can count on some 
number of drives that fail the upgrade.  Sometimes it's all the drives in a 
frame that fail.
Those drives then have to be upgraded one at a time. 
(drive firmware upgrade options via the web console are 
All drives at once, or, one at a time).   Sometimes
out of 50 drives, a third will fail the upgrade. (This is doing the upgrade 
live where you have the firmware activated on next umount).
I talked with the IBM folks about this, and the local CE thinks this is caused 
by some communications timeout in the lib.
I opened a support case about this and got nowhere.  
Currently we have some old node cards requiring the older firmware.
With a scheduled upgrade we are getting all Enhanced node cards.
I'm hoping getting to the latest/greatest code resolves this.

5) The case of the useless dial home.

Our libraries are set up for dial home when a problem comes up.
Here we just shake our heads and sigh . . . 
Sometimes it will dial home on something as simple as a I/O error writing to a 
tape, but sometimes won't dial home if the robot hangs.
It's almost a joke between us and the local CE's as to what/why/when it 
dials-home, or not.  No one can make sense of it.

6)  The case of the mysterious failing drive in frame 1 slot 12.

One of our libraries has a ongoing problem with one particular drive, the drive 
in frame 1 drive slot 12.  This particular drive will fail any time it is 
powered off.  It goes into some weird unknown state that requires the drive to 
be replaced.  Yes, that drive has been replaced many, many, many times over the 
years.  Firmware upgrade that requires the drive to be power cycled to activate 
the code, it fails and needs replaced.  Get a scsi reservation problem that 
requires the drive to be power cycled, it fails and needs replaced.  If the 
library has to be powered off/on (IBM doing some upgrade or something), the 
drive fails gets replaced.  You would think that after all this time IBM would 
figure out what is wrong - nope, they have no idea!

7) Atape - the mysterious of who within IBM owns it!

We all use atape on our hosts for the tape lib/drive driver.
If you ever suspect/have a problem with it, you will get nowhere in trying to 
get support from IBM.  Open a case on the 3584?  Nope, we don't support that - 
it's host software.  Open a case with AIX?
Nope, that's not a AIX piece of sftw.  Open a case with TSM support?
Nope, they have nothing to do with it.


Now . . .as far as a 3494 vs 3584 . . . 

The 3584 is a SCSI library.  It is designed around the SCSI standard
for a tape library.  This isn't bad or good, it's just different than
now the 3494 works.  Probably the biggest thing to get used to is how
TSM (or any backup product) keeps a inventory of tape cartridges and the
Slots (element addresses) they are in.  You never had to think about this
for the 3494, since it was in charge of the slot the tapes were in.
Just spend some time reading up on SCSI libraries to get familiar
with them.




Rick




-----------------------------------------
The information contained in this message is intended only for the
personal and confidential use of the recipient(s) named above. If
the reader of this message is not the intended recipient or an
agent responsible for delivering it to the intended recipient, you
are hereby notified that you have received this document in error
and that any review, dissemination, distribution, or copying of
this message is strictly prohibited. If you have received this
communication in error, please notify us immediately, and delete
the original message.