ADSM-L

Re: [ADSM-L] Raid 1 vs Raid 5

2010-08-10 01:37:23
Subject: Re: [ADSM-L] Raid 1 vs Raid 5
From: Roger Deschner <rogerd AT UIC DOT EDU>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Tue, 10 Aug 2010 00:36:34 -0500
RAID5 is fine. But my strategy for handling disk failures is a bit
radical. I developed it a while back when I inherited a bunch of rather
unreliable 36gb disks, which I only dared to run as RAID1. Some of you
will think I've now completely lost my mind, but this works well.

1. Set all TSM volumes in the array with the failed disk to readonly
2. Migrate or MOVE DATA as fast as I can to the next stgpool, or even to
the same stgpool. Which I do will depend on what time of day it is and
what part of the daily cycle. Many times, disk failures happen when the
stgpools are nowhere close to full, so this may go very quickly.
3. Disasseble the array.
4. Build a new array, incorporating the spare or replacement disk
5. Allocate new TSM volumes (I have a script for this) and place it all
back into service.

I've now beaten a RAID5 resync by several hours, which narrows the
second-failure exposure, avoided its performance penalty, and the data
is much safer because it's been migrated to where it was headed to
anyway. I have found migration to be MUCH faster than a RAID rebuild -
even if the failure happens during the primary backup window. It's
faster regardless of the RAID level - 1, 5, or 10. The reason is that
over the course of a 24-hour day a disk stgpool will statistically
average less than half full. Get rid of that data quickly and you don't
have to endure a RAID resync at all.

The only downside of this procedure is that it requires my active
participation, so if I'm off camping in the mountains, RAID rebuild can
just be allowed to happen with its performance penalty.

Roger Deschner      University of Illinois at Chicago     rogerd AT uic DOT edu
======I have not lost my mind -- it is backed up on tape somewhere.=====


On Mon, 9 Aug 2010, Orville Lantto wrote:

>The biggest factor in using RAID 5, and to a lessor extent RAID 0, is to get 
>the OS tuning and disk system tuning right.  TSM writes 256 kB blocks for 
>storage pools.  RAID 5 will work reasonably well if the stripe size on the 
>disk system is 256 kB.  Also, make sure all OS tuning takes the large blocks 
>into account.  The OS properties of the disk, Fibre card, and possibly the 
>volume group all have to allow 256 kB blocks to pass through without 
>fragmentation.
>
>
>
>
>
>Orville Lantto
>
>
>
>
>
>-----Original Message-----
>From: J. Pohlmann <jpohlmann AT SHAW DOT CA>
>To: ADSM-L AT VM.MARIST DOT EDU
>Sent: Mon, Aug 9, 2010 1:03 pm
>Subject: Re: [ADSM-L] Raid 1 vs Raid 5
>
>
>Another comment - RAID 5 gives you striping, so does RAID 0. Striping is
>
>what gives you disk performance so that you can "feed" multiple tape drives
>
>at a reasonable speed. Example a TSM server with 4 LTO4 drives has an
>
>achievable tape bandwidth somewhere around 300 MB/sec - your disk needs to
>
>be able to deliver this bandwidth unless you want you have your tape drives
>
>slow down (speed match or stop/backhitch).
>
>
>
>As for the impact of a drive failure - I also prefer RAID 5. Depending on
>
>the OS platform there is more work when you have to recover file systems.
>
>
>
>Joerg Pohlmann
>
>250-585-3711
>
>
>
>-----Original Message-----
>
>From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
>
>Ochs, Duane
>
>Sent: Monday, August 09, 2010 09:37
>
>To: ADSM-L AT VM.MARIST DOT EDU
>
>Subject: Re: [ADSM-L] Raid 1 vs Raid 5
>
>
>
>I use raid-5 for all diskpools.
>
>
>
>Although I don't agree with no raid, in some instances it is less of an
>
>issue than others.
>
>
>
>A few of my pools use caching for some of our more popular servers that get
>
>restores.
>
>As well as our daily exchange and db backups.
>
>
>
>Can't think of a single instance where calling a group back and saying we
>
>need you to resend a couple servers because a disk died on the backup
>
>server. I'm not saying that it is a huge issue, but from the mindset of the
>
>end users and upper management that we, the retention team, has not
>
>protected itself from a disk failure to save a tb or so of space would be
>
>very difficult to swallow.
>
>
>
>
>
>
>
>-----Original Message-----
>
>From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
>
>Kelly Lipp
>
>Sent: Monday, August 09, 2010 11:24 AM
>
>To: ADSM-L AT VM.MARIST DOT EDU
>
>Subject: Re: Raid 1 vs Raid 5
>
>
>
>I'll amplify what Skylar said: if your goal for this disk pool is short term
>
>storage then I probably wouldn't use any RAID protection as the data will be
>
>backed up to tape and then migrated to tape again.  And as Skylar said,
>
>worst case, the client will send it again if it somehow escapes.
>
>
>
>Conserve space: don't RAID...
>
>
>
>Kelly J. Lipp
>
>O: 719-531-5574 C: 719-238-5239
>
>kellyjlipp AT yahoo DOT com
>
>
>
>-----Original Message-----
>
>From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
>
>Skylar Thompson
>
>Sent: Monday, August 09, 2010 9:33 AM
>
>To: ADSM-L AT VM.MARIST DOT EDU
>
>Subject: Re: [ADSM-L] Raid 1 vs Raid 5
>
>
>
>Do you have tape in your primary storage hierarchy? If so, remember that
>
>even if part of your disk pool fails, you only lose access to the data that
>
>are on the failed volumes. You can then regenerate that data by either
>
>running another backup from the nodes that had backed up to that volume (if
>
>the backup to the copy pool hasn't happened yet) or from the copy pool. New
>
>backups can continue against the disk pool volumes that are still available,
>
>or can be cut through directly to tape if the entire pool is unavailable.
>
>
>
>On 08/09/10 08:23, Dana Holland wrote:
>
>> Does anyone have opinions about setting up storage pools as Raid 1 as
>
>> opposed to Raid 5? We have a very limited amount of disk space at the
>
>> moment and don't know when we'll get approval to buy more. At the time
>
>> we first started planning to implement TSM, we purchased what we
>
>> thought would be plenty of storage. But, that was 4 years ago - and
>
>> our usage has grown. Now, if I choose Raid 1, I barely have enough to
>
>> create a primary and copy storage pool for one of our servers. And
>
>> that isn't allowing for any growth at all. And I'm not sure how much
>
>> additional space incremental backups would take. I know Raid 5 would
>
>> give me more storage space, but I've also read that it's harder to
>
>> recover from if there's a disk failure (read this on a TSM site
>
>> somewhere). So, I'm wondering what some of you are using?
>
>>
>
>>
>
>> __________ Information from ESET NOD32 Antivirus, version of virus
>
>> signature database 5352 (20100809) __________
>
>>
>
>> The message was checked by ESET NOD32 Antivirus.
>
>>
>
>> http://www.eset.com
>
>
>
>--
>
>-- Skylar Thompson (skylar2 AT u.washington DOT edu)
>
>-- Genome Sciences Department, System Administrator
>
>-- Foege Building S048, (206)-685-7354
>
>-- University of Washington School of Medicine
>
>
>
>

<Prev in Thread] Current Thread [Next in Thread>