ADSM-L

Re: [ADSM-L] looking for experiences with the ibm 3584 library

2014-05-05 13:57:14
Subject: Re: [ADSM-L] looking for experiences with the ibm 3584 library
From: "Stackwick, Stephen" <Stephen.Stackwick AT ICFI DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Mon, 5 May 2014 17:55:08 +0000
At one customer with a 3494, we had to put a cover over the "sunroof" because 
the room's fluorescent lights were causing reader errors.

STEPHEN STACKWICK | Senior Consultant | 301.518.6352 (m) | Stephen.Stackwick AT 
icfi DOT com | icfi.com
ICF INTERNATIONAL | 7125 Thomas Edison Dr, Suite 100, Columbia, Md 21046 | 
443-718-4900  (o)

> -----Original Message-----
> From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On
> Behalf Of Zoltan Forray
> Sent: Monday, May 05, 2014 10:45
> To: ADSM-L AT VM.MARIST DOT EDU
> Subject: Re: [ADSM-L] looking for experiences with the ibm 3584 library
> 
> WOW - Deja Vu - We had these same problems on our 3494:
> 
> 
> *1) The case of the frayed ribbon cable.*
> *2) The case of the mysterious gripper failures.*
> 
> So, eventhough the 3584 is blazing fast, the basic
> concept/structure/problems haven't changed radically!
> 
> We have just hit the 1-year mark with our 3584.  All TSM servers that access 
> it
> are RH Linux  so lin_tape is the driver.
> 
> We have had a couple of service calls throughout the year, but nothing like
> with the 3494.  Some were drive problems.  Once was a known firmware
> problem.
> 
> Our 3584 is 7-frames (1-L23 + 6-D23) with 17 TS1130 E06 drives. We did not
> feel comfortable with the High-Density approach (i.e. stack tapes up to 4-
> deep horizontally and then play "3-card monte" when you need to get to the
> 4th deep tape).  Currently 63% full with over a 1000 JA tapes in the mix. If 
> we
> needed to reduce the number of tapes in the library,  we would replace
> those with JB tapes and get a 30%+ boost in tape capacity.
> 
> 
> 
> 
> On Mon, May 5, 2014 at 9:09 AM, Rhodes, Richard L. <
> rrhodes AT firstenergycorp DOT com> wrote:
> 
> > >We have been using a 3584 for about 12 years and have had no issues
> > >at all with it. The only time it has been "down"
> > >is for firmware upgrades, replacing tape drives (upgrade from
> > >LTO2 to LTO4), and when we moved to our new datacenter. Very stable
> > >and a great workhorse.
> >
> > I generally agree with this. We love our 3584's (we have two).
> > They have been very good workhorses.
> > BUT,  we have gone through some very frustrating problems with them!
> >
> > 1) The case of the frayed ribbon cable.
> >
> > One of the libraries had the ribbon cable that connects the library
> > proper to the robot fray, which caused a short, which took out several
> > cards.  It took IBM well over 30hr to resolve.
> > I think we had 3 or 4 IBM'ers onsite trying to figure this problem
> > out.  They wouldn't just order a bunch of parts.
> > They insisted in ordering parts one at a time as they decided to
> > replace them.  The parts are all far away, causing many, many hours of
> > waiting.
> >
> > 2)  The case of the mysterious gripper failures.
> >
> > The robot would get stuck with the a tape suspended between the robot
> > gripper and the drive mouth.  The tape cartridge pinned the robot.
> > Both libraries were doing this.
> > It got so bad the library would fail several times per day.
> > Many grippers were exchanged, it would work well for a while, then go
> > back to failing.  Long story short.  The cartridge slots that line the
> > walls of the library, as cartridges were inserted/removed, caused a
> > powder (a light dust) to get on everything in the library, causing
> > gripper failure.
> > IBM had to replace all the plastic slot things in both libraries.
> > This finally resolved this problem.
> >
> > 3)  The case of the slow console
> >
> > Others have said this. There are certain options where it can go away
> > for what seems like forever.  One thing I do once in a while is
> > removing old cleaning cartridges.  If I get on auto-pilot and start
> > hitting the menu items without thinking, I will inevitably hit this
> > one item that requests something about all cartridges . . . .it goes
> > away for what seems like forever getting that list.
> >
> > 4)  The case of the Web console weirdness
> >
> > The web console is simple to use and generally is great, but some
> > functions simply don't work well.  For example, requesting a tape to
> > be moved to a specific element address may or may not work.  We've
> > never been able to figure out why it works some times, and not others.
> >
> > Drive firmware upgrade can do flaky things.  We have 50 drives in each
> > 3584.  When I've performed a drive firmware upgrade on all drive, I
> > can count on some number of drives that fail the upgrade.  Sometimes
> > it's all the drives in a frame that fail.
> > Those drives then have to be upgraded one at a time.
> > (drive firmware upgrade options via the web console are
> > All drives at once, or, one at a time).   Sometimes
> > out of 50 drives, a third will fail the upgrade. (This is doing the
> > upgrade live where you have the firmware activated on next umount).
> > I talked with the IBM folks about this, and the local CE thinks this
> > is caused by some communications timeout in the lib.
> > I opened a support case about this and got nowhere.
> > Currently we have some old node cards requiring the older firmware.
> > With a scheduled upgrade we are getting all Enhanced node cards.
> > I'm hoping getting to the latest/greatest code resolves this.
> >
> > 5) The case of the useless dial home.
> >
> > Our libraries are set up for dial home when a problem comes up.
> > Here we just shake our heads and sigh . . .
> > Sometimes it will dial home on something as simple as a I/O error
> > writing to a tape, but sometimes won't dial home if the robot hangs.
> > It's almost a joke between us and the local CE's as to what/why/when
> > it dials-home, or not.  No one can make sense of it.
> >
> > 6)  The case of the mysterious failing drive in frame 1 slot 12.
> >
> > One of our libraries has a ongoing problem with one particular drive,
> > the drive in frame 1 drive slot 12.  This particular drive will fail
> > any time it is powered off.  It goes into some weird unknown state
> > that requires the drive to be replaced.  Yes, that drive has been
> > replaced many, many, many times over the years.  Firmware upgrade that
> > requires the drive to be power cycled to activate the code, it fails
> > and needs replaced.  Get a scsi reservation problem that requires the
> > drive to be power cycled, it fails and needs replaced.  If the library
> > has to be powered off/on (IBM doing some upgrade or something), the
> > drive fails gets replaced.  You would think that after all this time
> > IBM would figure out what is wrong - nope, they have no idea!
> >
> > 7) Atape - the mysterious of who within IBM owns it!
> >
> > We all use atape on our hosts for the tape lib/drive driver.
> > If you ever suspect/have a problem with it, you will get nowhere in
> > trying to get support from IBM.  Open a case on the 3584?  Nope, we
> > don't support that - it's host software.  Open a case with AIX?
> > Nope, that's not a AIX piece of sftw.  Open a case with TSM support?
> > Nope, they have nothing to do with it.
> >
> >
> > Now . . .as far as a 3494 vs 3584 . . .
> >
> > The 3584 is a SCSI library.  It is designed around the SCSI standard
> > for a tape library.  This isn't bad or good, it's just different than
> > now the 3494 works.  Probably the biggest thing to get used to is how
> > TSM (or any backup product) keeps a inventory of tape cartridges and
> > the Slots (element addresses) they are in.  You never had to think
> > about this for the 3494, since it was in charge of the slot the tapes were 
> > in.
> > Just spend some time reading up on SCSI libraries to get familiar with
> > them.
> >
> >
> >
> >
> > Rick
> >
> >
> >
> >
> > -----------------------------------------
> > The information contained in this message is intended only for the
> > personal and confidential use of the recipient(s) named above. If the
> > reader of this message is not the intended recipient or an agent
> > responsible for delivering it to the intended recipient, you are
> > hereby notified that you have received this document in error and that
> > any review, dissemination, distribution, or copying of this message is
> > strictly prohibited. If you have received this communication in error,
> > please notify us immediately, and delete the original message.
> 
> 
> 
> 
> --
> *Zoltan Forray*
> TSM Software & Hardware Administrator
> BigBro / Hobbit / Xymon Administrator
> Virginia Commonwealth University
> UCC/Office of Technology Services
> zforray AT vcu DOT edu - 804-828-4807
> Don't be a phishing victim - VCU and other reputable organizations will never
> use email to request that you reply with your password, social security
> number or confidential personal information. For more details visit
> http://infosecurity.vcu.edu/phishing.html