ADSM-L

Re: backup performance with db and log on a SAN

2002-09-01 18:41:57
Subject: Re: backup performance with db and log on a SAN
From: "Seay, Paul" <seay_pd AT NAPTHEON DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Sun, 1 Sep 2002 17:00:50 -0400
Roger,
The problem here is we have no idea what is the type of disk subsystem they
have.  Once we find that out we will know.

My TSM database is on a Shark 2105-F20 (it is RAID-5 under the covers).  My
database is 85GB and takes 1.3 hours to backup to Magstar drives.  I
consider that good for something that has 4K blocks and totally random.  We
stripe the database as well, may be not a good thing to do, but we did it
that way.  We are going to try some other things soon to see how we can
improve performance.

Paul D. Seay, Jr.
Technical Specialist
Naptheon Inc.
757-688-8180


-----Original Message-----
From: Roger Deschner [mailto:rogerd AT UIC DOT EDU]
Sent: Sunday, September 01, 2002 2:32 PM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: backup performance with db and log on a SAN


What a FASCINATING data point!

I think the problem is simply that it is RAID5. The WDSF/ADSM/TSM/ITSM
Database is accessed rather randomly during both normal operations, and
during database backup. RAID5 is optimized for sequential I/O operations.
It's great for things like conventional email systems that use huge mailbox
files, and read and rewrite them all at once. But not for this particular
database. Massive cache is worse than useless, because not only are you
reading from random disk locations, but each time you do, your RAID box is
doing a bunch of wasted I/O to refill the cache from someplace else as well.
Over and over for each I/O operation.

On our system, I once tried limited RAID on the Database, in software using
the AIX Logical Volume Manager, and it ran 25% slower on Database Backups.
Striping hurts, too. So I went the other way, and got bunches of small, fast
plain old JBOD disks, and it really sped things up. (Ask your used equipment
dealer about a full drawer of IBM 7133-020 9.1gb SAA disk drives - they are
cheap and ideally suited to the TSM DB.) Quite simply, more disk arms mean a
higher multiprogramming level within the server, and better performance.
Seek distances will always be high with a random access pattern, so you want
more arms all seeking those long distances at the same time.

OTOH, the Log should do fine with RAID5, since it is much more sequential.
Consider removing TSM Mirroring of the Log when you put it back into RAID5.

Can you disable the cache, or at least make it very small? That might help.

A very good use of your 2TB black box of storage: Disk Storage Pools. The
performance aspects of RAID5 should be well suited to online TSM Storage
Pools. You could finally hold a full day's worth of backups online in them,
which is an ideal situation as far as managing migration and copy pool
operations "by the book". This might even make client backups run faster.
RAID5 would protect this data from media failure, so you don't need to worry
about having only one copy of it for a while. Another good use: Set up a
Reclamation Storage Pool in it, which will free up a tape drive and
generally speed reclamation. Tape volumes are getting huge these days, so
you could use this kind of massive storage, optimized for sequential
operations, very beneficially for this.

So, to summarize, your investment in the SAN-attached Black Box O' Disk
Space is still good, for everything you probably planned to put in it,
EXCEPT for the TSM Database. That's only 36GB in your case, so leaving it
out of the Big Box is removing only 2% of it. If the other 98% works well,
the people who funded it should be happy.

P.S. I'm preparing a presentation for Share in Dallas next spring on this
exact topic; I really appreciate interesting data points like this. Thank
you for sharing it.

Roger Deschner      University of Illinois at Chicago     rogerd AT uic DOT edu


On Sat, 31 Aug 2002, Eliza Lau wrote:

>I recently moved the 36G TSM database and 10G log from attached SCSI
>disk drives to a SAN. Backing the db now takes twice as long as it used
>to (from 40 minutes to 90 minutes).  The old attached disk drives are
>non-RAID and TSM mirrored.  The SAN drives are RAID-5 and TSM mirrored.
>I know I have to pay a penalty for writing to RAID-5.  But considering
>the massive cache of the SAN it should not be too bad.  In fact,
>performance of client backups hasn't suffered.
>
>However, the day after the move, I noticed that backup db ran for twice
>as long.  It just doesn't make sense it will take a 100% performance
>hit from reading from RAID-5 disks.  Our performance guys looked at the
>sar data and didn't find any bottlenecks, no excessive iowait, paging,
>etc. The solution is to move the db and log back to where they were.
>But now management says: "We purchased this very expensive 2T IBM SAN
>and you are saying that you can't use it." Meanwhile, our Oracle people
>happily report that they are seeing the performance of their
>applications enjoy a 10% increase.
>
>Has anyone put their db and log on a SAN and what is your experience? I
>have called it in to Tivoli support but has yet to get a callback. Has
>anyone noticed that support is now very non-responsive?
>
>server; AIX 4.3.3,  TSM 4.2.1.15
>
>Thanks,
>Eliza Lau
>Virginia Tech Computing Center
>1700 Pratt Drive
>Blacksburg, VA 24060
>lau AT vt DOT edu
>