ADSM-L

Re: [ADSM-L] looking for experiences with the ibm 3584 library

2014-05-05 10:22:32
Subject: Re: [ADSM-L] looking for experiences with the ibm 3584 library
From: "Arbogast, Warren K" <warbogas AT IU DOT EDU>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Mon, 5 May 2014 14:20:41 +0000
Hi Richard,
We're on Linux so we use lin_tape, which is atape for open systems.  lin_tape 
is free, but you have to buy a license for support.  The group that supports it 
might be named "Enterprise Tape Systems".   At any rate, we did buy a license 
for it, and that has been quite handy.  The current protocol to get help, is to 
open a PMR for a lin_tape problem using the TSM Support site.  TSM Support 
confirms we have a license, and that it is a lin_tape issue. Then, they connect 
us to the right person in Enterprise Tape Systems.  Once we get connected, 
support is very good, and the indirect protocol doesn't take much longer.

Best wishes,
Keith 


On May 5, 2014, at 9:09 AM, Rhodes, Richard L. wrote:

>> We have been using a 3584 for about 12 years and have had 
>> no issues at all with it. The only time it has been "down" 
>> is for firmware upgrades, replacing tape drives (upgrade from 
>> LTO2 to LTO4), and when we moved to our new datacenter. Very 
>> stable and a great workhorse.
> 
> I generally agree with this. We love our 3584's (we have two).
> They have been very good workhorses.  
> BUT,  we have gone through some very frustrating problems with them!
> 
> 1) The case of the frayed ribbon cable.
> 
> One of the libraries had the ribbon cable that connects
> the library proper to the robot fray, which caused a short, which took out 
> several cards.  It took IBM well over 30hr to resolve.  
> I think we had 3 or 4 IBM'ers onsite trying to figure 
> this problem out.  They wouldn't just order a bunch of parts.
> They insisted in ordering parts one at a time as they 
> decided to replace them.  The parts are all far away, causing
> many, many hours of waiting.  
> 
> 2)  The case of the mysterious gripper failures.
> 
> The robot would get stuck with the a tape suspended
> between the robot gripper and the drive mouth.  The tape
> cartridge pinned the robot. Both libraries were doing this. 
> It got so bad the library would fail several times per day.
> Many grippers were exchanged, it would work well for a while,
> then go back to failing.  Long story short.  The cartridge slots
> that line the walls of the library, as cartridges were 
> inserted/removed, caused a powder (a light dust) to get on 
> everything in the library, causing gripper failure.  
> IBM had to replace all the plastic slot things in both libraries.  
> This finally resolved this problem.
> 
> 3)  The case of the slow console
> 
> Others have said this. There are certain options where it can 
> go away for what seems like forever.  One thing I do 
> once in a while is removing old cleaning cartridges.  If I 
> get on auto-pilot and start hitting the menu items
> without thinking, I will
> inevitably hit this one item that requests something about all 
> cartridges . . . .it goes away for what seems like forever
> getting that list.
> 
> 4)  The case of the Web console weirdness
> 
> The web console is simple to use and generally is great, but 
> some functions simply don't work well.  For example, requesting
> a tape to be moved to a specific element address may or may not
> work.  We've never been able to figure out why it works some times,
> and not others.  
> 
> Drive firmware upgrade can do flaky things.  We have 50 drives
> in each 3584.  When I've performed a drive firmware upgrade 
> on all drive, I can count on some number of drives that fail
> the upgrade.  Sometimes it's all the drives in a frame that fail.
> Those drives then have to be upgraded one at a time. 
> (drive firmware upgrade options via the web console are 
> All drives at once, or, one at a time).   Sometimes
> out of 50 drives, a third will fail the upgrade. (This is doing
> the upgrade live where you have the firmware activated on next umount).
> I talked with the IBM folks about this, and the local CE thinks
> this is caused by some communications timeout in the lib.
> I opened a support case about this and got nowhere.  
> Currently we have some old node cards requiring the older firmware.
> With a scheduled upgrade we are getting all Enhanced node cards.
> I'm hoping getting to the latest/greatest code resolves this.
> 
> 5) The case of the useless dial home.
> 
> Our libraries are set up for dial home when a problem comes up.
> Here we just shake our heads and sigh . . . 
> Sometimes it will dial home on something as simple as a I/O
> error writing to a tape, but sometimes won't dial home if the robot hangs.
> It's almost a joke between us and the local CE's as to 
> what/why/when it dials-home, or not.  No one can make sense of it.
> 
> 6)  The case of the mysterious failing drive in frame 1 slot 12.
> 
> One of our libraries has a ongoing problem with one particular drive,
> the drive in frame 1 drive slot 12.  This particular drive will fail 
> any time it is powered off.  It goes into some weird unknown 
> state that requires the drive to be replaced.  Yes, that drive has
> been replaced many, many, many times over the years.  Firmware upgrade
> that requires the drive to be power cycled to activate the code, 
> it fails and needs replaced.  Get a scsi reservation problem that requires the
> drive to be power cycled, it fails and needs replaced.  If the library has to
> be powered off/on (IBM doing some upgrade or something), the drive fails 
> gets replaced.  You would think that after all this time
> IBM would figure out what is wrong - nope, they have no idea!
> 
> 7) Atape - the mysterious of who within IBM owns it!
> 
> We all use atape on our hosts for the tape lib/drive driver.
> If you ever suspect/have a problem with it, you will get nowhere in 
> trying to get support from IBM.  Open a case on the 3584?  Nope,
> we don't support that - it's host software.  Open a case with AIX?
> Nope, that's not a AIX piece of sftw.  Open a case with TSM support?
> Nope, they have nothing to do with it.
> 
> 
> Now . . .as far as a 3494 vs 3584 . . . 
> 
> The 3584 is a SCSI library.  It is designed around the SCSI standard
> for a tape library.  This isn't bad or good, it's just different than
> now the 3494 works.  Probably the biggest thing to get used to is how
> TSM (or any backup product) keeps a inventory of tape cartridges and the
> Slots (element addresses) they are in.  You never had to think about this
> for the 3494, since it was in charge of the slot the tapes were in.
> Just spend some time reading up on SCSI libraries to get familiar
> with them.
> 
> 
> 
> 
> Rick
> 
> 
> 
> 
> -----------------------------------------
> The information contained in this message is intended only for the
> personal and confidential use of the recipient(s) named above. If
> the reader of this message is not the intended recipient or an
> agent responsible for delivering it to the intended recipient, you
> are hereby notified that you have received this document in error
> and that any review, dissemination, distribution, or copying of
> this message is strictly prohibited. If you have received this
> communication in error, please notify us immediately, and delete
> the original message.