ADSM-L

Re: backup performance with db and log on a SAN

2002-09-02 08:01:57
Subject: Re: backup performance with db and log on a SAN
From: Eliza Lau <lau AT VTCAT.CC.VT DOT EDU>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Mon, 2 Sep 2002 06:37:46 -0400
Roger,

Thanks for the detailed analysis.  This is what I was planning to do: moved
the db back to attahced SCSI drives.  Re-configuring one drawer in the Shark
to non-RAID as another person suggested is out of the question since TSM
is using only a small portion of the Shark.  Please read the other messages
that I posed for our SAN configuration.

Eliza

>
> What a FASCINATING data point!
>
> I think the problem is simply that it is RAID5. The WDSF/ADSM/TSM/ITSM
> Database is accessed rather randomly during both normal operations, and
> during database backup. RAID5 is optimized for sequential I/O
> operations. It's great for things like conventional email systems that
> use huge mailbox files, and read and rewrite them all at once. But not
> for this particular database. Massive cache is worse than useless,
> because not only are you reading from random disk locations, but each
> time you do, your RAID box is doing a bunch of wasted I/O to refill the
> cache from someplace else as well. Over and over for each I/O operation.
>
> On our system, I once tried limited RAID on the Database, in software
> using the AIX Logical Volume Manager, and it ran 25% slower on Database
> Backups. Striping hurts, too. So I went the other way, and got bunches
> of small, fast plain old JBOD disks, and it really sped things up. (Ask
> your used equipment dealer about a full drawer of IBM 7133-020 9.1gb SAA
> disk drives - they are cheap and ideally suited to the TSM DB.) Quite
> simply, more disk arms mean a higher multiprogramming level within the
> server, and better performance. Seek distances will always be high with
> a random access pattern, so you want more arms all seeking those long
> distances at the same time.
>
> OTOH, the Log should do fine with RAID5, since it is much more
> sequential. Consider removing TSM Mirroring of the Log when you put it
> back into RAID5.
>
> Can you disable the cache, or at least make it very small? That might
> help.
>
> A very good use of your 2TB black box of storage: Disk Storage Pools.
> The performance aspects of RAID5 should be well suited to online TSM
> Storage Pools. You could finally hold a full day's worth of backups
> online in them, which is an ideal situation as far as managing migration
> and copy pool operations "by the book". This might even make client
> backups run faster. RAID5 would protect this data from media failure, so
> you don't need to worry about having only one copy of it for a while.
> Another good use: Set up a Reclamation Storage Pool in it, which will
> free up a tape drive and generally speed reclamation. Tape volumes are
> getting huge these days, so you could use this kind of massive storage,
> optimized for sequential operations, very beneficially for this.
>
> So, to summarize, your investment in the SAN-attached Black Box O' Disk
> Space is still good, for everything you probably planned to put in it,
> EXCEPT for the TSM Database. That's only 36GB in your case, so leaving
> it out of the Big Box is removing only 2% of it. If the other 98% works
> well, the people who funded it should be happy.
>
> P.S. I'm preparing a presentation for Share in Dallas next spring on
> this exact topic; I really appreciate interesting data points like this.
> Thank you for sharing it.
>
> Roger Deschner      University of Illinois at Chicago     rogerd AT uic DOT 
> edu
>
>
> On Sat, 31 Aug 2002, Eliza Lau wrote:
>
> >I recently moved the 36G TSM database and 10G log from attached SCSI disk
> >drives to a SAN. Backing the db now takes twice as long as it used to
> >(from 40 minutes to 90 minutes).  The old
> >attached disk drives are non-RAID and TSM mirrored.  The SAN drives are
> >RAID-5 and TSM mirrored.  I know I have to pay a penalty for writing to
> >RAID-5.  But considering the massive cache of the SAN it should not be
> >too bad.  In fact, performance of client backups hasn't suffered.
> >
> >However, the day after the move, I noticed that backup db ran for twice
> >as long.  It just doesn't make sense it will take a 100% performance hit
> >from reading from RAID-5 disks.  Our performance guys looked at the sar
> >data and didn't find any bottlenecks, no excessive iowait, paging, etc.
> >The solution is to move the db and log
> >back to where they were.  But now management says: "We purchased this
> >very expensive 2T IBM SAN and you are saying that you can't use it."
> >Meanwhile, our Oracle people happily report that they are seeing
> >the performance of their applications enjoy a 10% increase.
> >
> >Has anyone put their db and log on a SAN and what is your experience?
> >I have called it in to Tivoli support but has yet to get a callback.
> >Has anyone noticed that support is now very non-responsive?
> >
> >server; AIX 4.3.3,  TSM 4.2.1.15
> >
> >Thanks,
> >Eliza Lau
> >Virginia Tech Computing Center
> >1700 Pratt Drive
> >Blacksburg, VA 24060
> >lau AT vt DOT edu
> >
>