Re: [Veritas-bu] Destaging going slow

I think we’re crossing wires a bit here. I agree with your view point if you are talking about production storage. All of my most important production data is sitting on expensive, high-speed, fiber channel, fully redundant, highly fault-tolerant, replicated to DR, SAN drives. I think the maximum size we use for these raid groups is 8 disks. I’m not sure what our rebuild times are on these arrays, but I certainly agree with your point about bigger arrays taking longer to rebuilt. But, I was not talking about production.

My backup DSSUs are all 15 disk raid5, (actually, the MD1220s I put in last week are 24 disk RAID6) and my point is that they are much less critical than production. I agree with you that there is risk in making such big raid5 arrays, but my point is that risk is mitigated. Think of all the things that would have to go wrong, to make such a failure critical.

1) Whatever redundancy and fault tolerance we have built into the production storage must fail.

2) The DR copy must be corrupt / bad / down / unreadable.

3) The backup data must be so fresh, it had not been written to tape yet.

4) The data must be so critical, that a previous full + existing incrementals (on tape) are worthless

5) The needed backup data must reside on the DSSU that fails.

6) Two disks must fail on that DSSU.

That is a lot of bad juju (or admin incompetence) that must all happen at once to make a DSSU failure critical, and around he were like to call that acceptable risk. In my case it comes down to $$$$. Sure, I could create some ridiculously fast backup performance and replicate deduped data to DR. I would be using FC or SAS disks, RAID10, and dedupe appliances all the way. It would be awesome, but also expensive and uncalled for (from the business’s perspective.)

Your requirements may vary. But, I don’t think it’s appropriate to say “Don’t ever do this”, because it works great here. Perhaps, “Don’t ever do this in production” but I hesitate even to say that. How about “carefully consider the risks, opportunities, strengths and weaknesses of any proposed storage solution before purchasing and implementing”?

-Jonathan

From: Lightner, Jeff [mailto:jlightner AT water DOT com]
Sent: Tuesday, June 29, 2010 3:54 PM
To: Martin, Jonathan; veritas-bu AT mailman.eng.auburn DOT edu
Subject: RE: [Veritas-bu] Destaging going slow

“Gotten away with” I’m sure meant “hasn’t been bit by” not “has evaded authorities” in this context I’m sure.

Monitoring systems is great and certainly can help prevent the situation many find themselves in where they aren’t monitoring and lose a disk without realizing it then lose another later and go crashing to the floor.

However, many of us have run into the scenario where we ARE monitoring and know exactly when the first drive failed and/or have a hot spare that automatically starts rebuilding the moment it does fail but despite being that proactive have had another drive fail while the rebuild was in progress and thereby lost the entire RAID5 set. RAID5 is certainly better than JBOD because it does provide some redundancy but in very large arrays it makes sense to try to use a better RAID level OR to split it into multiple RAID5 sets to minimize how much is lost when this happens. The more disks you have in a single RAID5 set the more likely it is you’re going to experience such a double disk failure at some point.

From: veritas-bu-bounces AT mailman.eng.auburn DOT edu [mailto:veritas-bu-bounces AT mailman.eng.auburn DOT edu] On Behalf Of Martin, Jonathan
Sent: Tuesday, June 29, 2010 3:13 PM
To: veritas-bu AT mailman.eng.auburn DOT edu
Subject: Re: [Veritas-bu] Destaging going slow

First of all, suck it Neil Conner. I’m about to disagree with Ed (again) and there is nothing you and your fish eating friends can do about it.

Gotten away with it? I’m not stealing bread from the supermarket, I’ve made a calculated decision. I run 18 MD1000s in this configuration globally and I have yet to lose an array. The added capacity and speed benefit of a 15 disk raid array (no hot spare) is plenty worth the risk of the array going down. Further, this risk is mitigated with properly configured Dell OpenManage which alerts me immediately if a disk fails so I can have it replaced.

Sure, I may eventually lose an array, but this is backup data. It’s importance to most businesses is somewhere between Dev and QA, and if I were (worst case scenario) to lose a Raid5 and 10TB of backup data, then I’d inform the appropriate application groups and move on. It’s not like most of the data isn’t probably ok (backup not needed) or on tape (array not needed) or has incremental available (also on tape.) This isn’t a production database or file server, it’s backups, IMO, Ed’s “every bit counts” attitude is completely out of step with the real world.

-Jonathan

PS: I do agree with Ed about 1TB disks, but in my case because of the poor performance not the raid implications. 15 x 500GB SATA in a Raid-5 is the backbone of my operation.

From: veritas-bu-bounces AT mailman.eng.auburn DOT edu [mailto:veritas-bu-bounces AT mailman.eng.auburn DOT edu] On Behalf Of Ed Wilts
Sent: Tuesday, June 29, 2010 10:09 AM
To: veritas-bu AT mailman.eng.auburn DOT edu
Subject: Re: [Veritas-bu] Destaging going slow

From a storage perspective, I've got all disks in a Dell MD1000 enclosure configured in a single 15 disk RAID-5.

Don't ever do this. Jonathan has obviously gotten away with this (so far) but using large drives (e.g. 1TB) in a 15-member RAID-5 set is just asking to lose the array due to a double-disk failure.

I've done several recoveries for our Windows Server Team because they're configured large RAID-5 sets and had double-disk failures.

../Ed

Ed Wilts, RHCE, BCFP, BCSD, SCSP, SCSE
ewilts AT ewilts DOT org

Proud partner. Susan G. Komen for the Cure.

Please consider our environment before printing this e-mail or attachments.

----------------------------------
CONFIDENTIALITY NOTICE: This e-mail may contain privileged or confidential information and is for the sole use of the intended recipient(s). If you are not the intended recipient, any disclosure, copying, distribution, or use of the contents of this information is prohibited and may be unlawful. If you have received this electronic transmission in error, please reply immediately to the sender that you have received the message in error, and delete it. Thank you.
----------------------------------

_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu