ADSM-L

Re: TSM Database Disk Layout Recommendations

2002-10-08 13:29:55
Subject: Re: TSM Database Disk Layout Recommendations
From: Roger Deschner <rogerd AT UIC DOT EDU>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Tue, 8 Oct 2002 12:19:06 -0500
Thanks, Paul, for putting this into a bit of perspective. RAID-5 is,
indeed, not always bad, and the disk tuning parameters can be critical.
I'm going to try something like your vmtune parameters when we move to a
larger machine in a couple weeks, which will be a RS/6000 6H1
configuration fairly close to yours.

I have been saying to avoid RAID-5 like the plague - for the Database
only. If you think about how the Database, Log, and Disk storage pools
work and are accessed, they are very different from one another.

THE DATABASE is accessed randomly, even during backup, and it's normal
state of affairs is to be somewhat fragmented. RAID-5 will perform badly
in this environment. It's an even mix of reads and writes, scattered
randomly across the entire database. Even DBBackup, which is a heavily
read operation, is not sequential at all, due to normal fragmentation.
And, DBBackup is rarely the only thing running when it runs - the system
always has to do something else at the same time, which pulls the disk
arm away from where it was reading. Mirrored JBOD disks are the only way
to go, and as many of them as possible. You want lots of disk arms, to
improve your multiprogramming level and ultimately the throughput (as
opposed to performance!) of your entire system.

Beware when measuring performance of the database, that as you move it
about, you might be improving performance just by moving it, by reducing
fragmentation. This is a false improvement, because fragmentation will
be back over time. If you move it by unload/reload, this is certainly
the case. Even if you move it by DELETE DBVOL, you are reducing a
different kind of fragmentation - fragmentation of whole blocks across
the disk landscape. Reduction of fragmentation can lead you to believe
FALSELY that you have achieved a better disk configuration, when all
you've really done is defragmentation. You need to wait for
refragmentation to occur naturally to see if it really helped.

THE LOG is mostly accessed sequentially, and is mostly write. RAID-5? Go
for it! An almost ideal RAID-5 application. Even having "pinned tail"
problems shouldn't hurt performance much.

DISK STORAGE POOLS have big, long sequential writes during client
backups, followed by big, long sequential reads during migration.
Another case where RAID-5 should work fine. Plus, in this case, RAID-5
gives you protection against drive failure at very little effort on your
part.

The jury is out on Raw Volumes versus LVM-managed spaces. This will
probably boil down to a functionality versus performance tradeoff, and
ultimately, no clear winner. One clear issue with AIX LVM is that it
introduces another whole level where fragmentation can occur, and it
hides it better. AIX LVM is almost more like a database than a
filesystem. I'm starting to be convinced by discussion here that Raw
Volumes are at least worth a try.

Roger Deschner      University of Illinois at Chicago     rogerd AT uic DOT edu


On Mon, 7 Oct 2002, Seay, Paul wrote:

>The reason why is because everyone is saying RAID-5 is bad versus a
>particular implementation or whatever.  The Enterprise Storage Server
>(SHARK) is RAID-5 SSA under the covers.  It flies because it has controllers
>on the front end that essentially eliminate the RAID-5 effect and can
>actually blow away RAID-1 solutions under high sequential write
>applications.  The HDS 9900 series is the same.
>
>It is all a balance.  RAID-5 works great for some things, bad for others.
>If your RAID-5 solution does any kind of parity calculation in the array it
>will perform well on sequential write.  Why? Because the generally change
>from RAID-5 to RAID-3 which is the fastest on write.
>
>Now, considering this.  What does high sequential write?
>
>        Generally, Storage Pools and the LOG.
>
>What does high sequential read?
>
>        Storage Pools and the DB during backup.
>
>So, to generally say RAID-5 is bad is totally incorrect.  It depends on your
>hardware.  The number of simultaneous write operations you can perform, the
>speed of your disk, etc.
>
>In the case of TSM it even depends on how your environment is setup.  Would
>I use software RAID-5, heck no, the CPU overhead is astronomical and the
>read back penalty on something like Windows is will just kill you because it
>is a dumb RAID-5 implementation.
>
>I hope everyone will look at what they are saying and give specific complete
>configuration information in the future.
>
>We use the ESS.  We do striping in the AIX file system, not RAID-5.
>Protection is performed in the ESS.  We had some serious performance
>problems in relation to other ESS applications because we did not implement
>our striping correctly and our AIX system needed some serious tuning.
>
>If you are running default AIX vmtune parameters.  You are probably
>experiencing bad performance, not because of the RAID-5 implementation, but
>because of the stress RAID-5 puts on the filesystem buffers in the non-comp
>space and causing astronomical paging on your system.  You change to raw and
>magically the problem goes away.  Why, becauase the file system usage drops
>dramatically and the paging stops.
>
>
>By changing to the recommendations folks kindly suggested over the past
>weeks.  My database backup time went from about 3 hours down to 1 hour for
>an 85GB database.  My storage pools have dramatically improved as well and I
>have not corrected their striping yet.  How did I get the performance:
>
>        maxperm set to 40
>      minperm set to 10
>        max page read ahead set to 256K
>        bufferpool set to 256MB (memory on the machine is 2GB)
>        sufficient free pages to support the max read ahead (there are rules
>about this number)
>
>Our machine is a P660-6H1,
>        (4) 450MZ processors,
>        2GB memory,
>        2 Fibre Channel cards for the disk, 4 for the tape (1 Gbit)
>        640GB of ESS disk
>        14 Magstar Drives in use so far, eventually 32.
>        2 Gbit Ethernet Cards.
>
>Yes, my environment may be unique, but at least I am telling you why what I
>have works well so that a generalization is not made that has no point of
>reference.
>
>Thanks Mark, you probably saved us about 55K to prevent us from buying a
>much larger TSM server.  We will probably change to (6) 750mz processors and
>4GB of memory, add another Gbit card, and 2 more FC Cards to a new IO frame
>for the 6H1.  Your methodical approach was exactly what we needed to
>understand the issues and what to do.  Our machine purrs like a kitten now.
>
>
>
>Paul D. Seay, Jr.
>Technical Specialist
>Naptheon Inc.
>757-688-8180
>