ADSM-L

Re: backup performance with db and log on a SAN

2002-09-01 16:08:19
Subject: Re: backup performance with db and log on a SAN
From: Roger Deschner <rogerd AT UIC DOT EDU>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Sun, 1 Sep 2002 13:31:55 -0500
What a FASCINATING data point!

I think the problem is simply that it is RAID5. The WDSF/ADSM/TSM/ITSM
Database is accessed rather randomly during both normal operations, and
during database backup. RAID5 is optimized for sequential I/O
operations. It's great for things like conventional email systems that
use huge mailbox files, and read and rewrite them all at once. But not
for this particular database. Massive cache is worse than useless,
because not only are you reading from random disk locations, but each
time you do, your RAID box is doing a bunch of wasted I/O to refill the
cache from someplace else as well. Over and over for each I/O operation.

On our system, I once tried limited RAID on the Database, in software
using the AIX Logical Volume Manager, and it ran 25% slower on Database
Backups. Striping hurts, too. So I went the other way, and got bunches
of small, fast plain old JBOD disks, and it really sped things up. (Ask
your used equipment dealer about a full drawer of IBM 7133-020 9.1gb SAA
disk drives - they are cheap and ideally suited to the TSM DB.) Quite
simply, more disk arms mean a higher multiprogramming level within the
server, and better performance. Seek distances will always be high with
a random access pattern, so you want more arms all seeking those long
distances at the same time.

OTOH, the Log should do fine with RAID5, since it is much more
sequential. Consider removing TSM Mirroring of the Log when you put it
back into RAID5.

Can you disable the cache, or at least make it very small? That might
help.

A very good use of your 2TB black box of storage: Disk Storage Pools.
The performance aspects of RAID5 should be well suited to online TSM
Storage Pools. You could finally hold a full day's worth of backups
online in them, which is an ideal situation as far as managing migration
and copy pool operations "by the book". This might even make client
backups run faster. RAID5 would protect this data from media failure, so
you don't need to worry about having only one copy of it for a while.
Another good use: Set up a Reclamation Storage Pool in it, which will
free up a tape drive and generally speed reclamation. Tape volumes are
getting huge these days, so you could use this kind of massive storage,
optimized for sequential operations, very beneficially for this.

So, to summarize, your investment in the SAN-attached Black Box O' Disk
Space is still good, for everything you probably planned to put in it,
EXCEPT for the TSM Database. That's only 36GB in your case, so leaving
it out of the Big Box is removing only 2% of it. If the other 98% works
well, the people who funded it should be happy.

P.S. I'm preparing a presentation for Share in Dallas next spring on
this exact topic; I really appreciate interesting data points like this.
Thank you for sharing it.

Roger Deschner      University of Illinois at Chicago     rogerd AT uic DOT edu


On Sat, 31 Aug 2002, Eliza Lau wrote:

>I recently moved the 36G TSM database and 10G log from attached SCSI disk
>drives to a SAN. Backing the db now takes twice as long as it used to
>(from 40 minutes to 90 minutes).  The old
>attached disk drives are non-RAID and TSM mirrored.  The SAN drives are
>RAID-5 and TSM mirrored.  I know I have to pay a penalty for writing to
>RAID-5.  But considering the massive cache of the SAN it should not be
>too bad.  In fact, performance of client backups hasn't suffered.
>
>However, the day after the move, I noticed that backup db ran for twice
>as long.  It just doesn't make sense it will take a 100% performance hit
>from reading from RAID-5 disks.  Our performance guys looked at the sar
>data and didn't find any bottlenecks, no excessive iowait, paging, etc.
>The solution is to move the db and log
>back to where they were.  But now management says: "We purchased this
>very expensive 2T IBM SAN and you are saying that you can't use it."
>Meanwhile, our Oracle people happily report that they are seeing
>the performance of their applications enjoy a 10% increase.
>
>Has anyone put their db and log on a SAN and what is your experience?
>I have called it in to Tivoli support but has yet to get a callback.
>Has anyone noticed that support is now very non-responsive?
>
>server; AIX 4.3.3,  TSM 4.2.1.15
>
>Thanks,
>Eliza Lau
>Virginia Tech Computing Center
>1700 Pratt Drive
>Blacksburg, VA 24060
>lau AT vt DOT edu
>