ADSM-L

Re: AW: Normal # of failures on tape libraries

2005-12-28 14:03:06
Subject: Re: AW: Normal # of failures on tape libraries
From: Tab Trepagnier <Tab.Trepagnier AT LAITRAM DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Wed, 28 Dec 2005 13:02:49 -0600
I've held off replying to this but J.S.'s question prompted one.

We have 12 drives spread across three libraries: an IBM 3575, 3583, and an 
HP 4/40 DLT.  At one time we had a total of 18 drives - including those in 
a second 3575 and a Compaq 2-drive DLT.

On the HP and the IBM 3583 we see drive failures approximately every two 
months.  We have had occasions where as many as three different drives 
have failed in a single week.  Only the 3575s have avoided that failure 
frequency; with them it's about one drive failure per year.

If you take a step back and look at the big picture, it's that  - in my 
opinion - the commercial quality tape drives are simply not designed for 
the pounding they get from an enterprise backup system like TSM.  I'm 
confident that users of Veritas, etc. will report similar failure rates. 
When I've reported to HP Tech Support that "their" library receives 250 GB 
a day they exclaimed "Wow!".  Imagine if you're one of the forum 
participants who sends TBs/day to the tapes.

It seems like the older tape equipment like the 3575 were designed to take 
that pounding. 

Symptom-wise, what we see is this:
- LTO:  mount/dismount failures.
- DLT: mount failures and poor performance; first sign of a drive wearing 
out is its throughput drops to 10% of its normal rate.
- The few 3570 failures we've had usually caused dismount failures.

And then there's the tape pickers.  We are on at least our tenth in our 
4-year old 3583, and on at least our third in our 4-1/2 year old HP.
And when a picker fails the whole library is out of service so it's a much 
bigger problem.

Just my two cents.

Tab Trepagnier
TSM Administrator
Laitram, L.L.C.









"ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU> wrote on 12/22/2005 
01:06:30 PM:

> I agree with previous answers, and I second to "envirinmental" note:
> 
> Is your environment fine?
> Stable temperature & humidity within allowed limits, no vibrations, 
> dustless, pure sinus stable voltage mains?
> Evere tried another tape manufacturer?
> 
> regards
> J.S.
> 
> 
> > -----Ursprüngliche Nachricht-----
> > Von: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] Im 
> > Auftrag von Dennis Melburn W IT743
> > Gesendet: Dienstag, 13. Dezember 2005 17:31
> > An: ADSM-L AT VM.MARIST DOT EDU
> > Betreff: Normal # of failures on tape libraries
> > 
> > Our sites use ADIC Scalar 1Ks as well as one ADIC 10K.  The 
> > Scalar 1Ks have  4 LTO1 drives in each and the 10K has 34 
> > LTO2 drives.  We experience occasional failures on these 
> > drives and have to replace them.
> > My question is, is it normal for a site that has alot of 
> > drives to experience drive failures about every 1-1.5 months? 
> >  My manager is rather annoyed at the fact that it seems that 
> > we are constantly replacing drives even though it doesn't 
> > cause any downtime for our TSM servers while they are being 
> > replaced.  If this is a normal part of having tape libraries 
> > then that is fine, but I don't have enough experience in this 
> > field to say either way, so that is why I am asking all of you.
> > 
> > 
> > Mel Dennis
> > 
> >