ADSM-L

Re: backup performance with db and log on a SAN

2002-09-01 23:04:00
Subject: Re: backup performance with db and log on a SAN
From: Eliza Lau <lau AT VTCAT.CC.VT DOT EDU>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Sun, 1 Sep 2002 22:57:18 -0400
Paul,

We have a Shark 2105-F20, RAID-5 with 36G 10K RPM drives.  There are 8 LSSes,
each striped across 8 disks.  Each LSS is sliced into 21G partitions.  The TSM
db is on a volume group built with two of these slices.  We have two 16-port
swithes.
Tape library is a 3494 with six 3590E FC drives.  All six drives are connected
to the switches, three each.  The tsmserver host has 4 HBAs.  Two are
connected to each switch, with one zoned for tape traffic and the other
zoned for disk traffic.  Db backup is done daily after all client backup
windows, backup copypool, and migration of all disk pools to tape.  It is a
fairly quiet time of the day.  The dbpoolsize is set to selftune, and
db cache is over 99%.

I am interested to see how you can backup your 85G database in 1.3 hours.  IBM
helped set up the Shark and the switches.  I assumed that the guy knew what
he was doing.

Eliza Lau
Virginia Tech Computing Center


>
> Roger,
> The problem here is we have no idea what is the type of disk subsystem they
> have.  Once we find that out we will know.
>
> My TSM database is on a Shark 2105-F20 (it is RAID-5 under the covers).  My
> database is 85GB and takes 1.3 hours to backup to Magstar drives.  I
> consider that good for something that has 4K blocks and totally random.  We
> stripe the database as well, may be not a good thing to do, but we did it
> that way.  We are going to try some other things soon to see how we can
> improve performance.
>
> Paul D. Seay, Jr.
> Technical Specialist
> Naptheon Inc.
> 757-688-8180
>
>
> -----Original Message-----
> From: Roger Deschner [mailto:rogerd AT UIC DOT EDU]
> Sent: Sunday, September 01, 2002 2:32 PM
> To: ADSM-L AT VM.MARIST DOT EDU
> Subject: Re: backup performance with db and log on a SAN
>
>
> What a FASCINATING data point!
>
> I think the problem is simply that it is RAID5. The WDSF/ADSM/TSM/ITSM
> Database is accessed rather randomly during both normal operations, and
> during database backup. RAID5 is optimized for sequential I/O operations.
> It's great for things like conventional email systems that use huge mailbox
> files, and read and rewrite them all at once. But not for this particular
> database. Massive cache is worse than useless, because not only are you
> reading from random disk locations, but each time you do, your RAID box is
> doing a bunch of wasted I/O to refill the cache from someplace else as well.
> Over and over for each I/O operation.
>
> On our system, I once tried limited RAID on the Database, in software using
> the AIX Logical Volume Manager, and it ran 25% slower on Database Backups.
> Striping hurts, too. So I went the other way, and got bunches of small, fast
> plain old JBOD disks, and it really sped things up. (Ask your used equipment
> dealer about a full drawer of IBM 7133-020 9.1gb SAA disk drives - they are
> cheap and ideally suited to the TSM DB.) Quite simply, more disk arms mean a
> higher multiprogramming level within the server, and better performance.
> Seek distances will always be high with a random access pattern, so you want
> more arms all seeking those long distances at the same time.
>
> OTOH, the Log should do fine with RAID5, since it is much more sequential.
> Consider removing TSM Mirroring of the Log when you put it back into RAID5.
>
> Can you disable the cache, or at least make it very small? That might help.
>
> A very good use of your 2TB black box of storage: Disk Storage Pools. The
> performance aspects of RAID5 should be well suited to online TSM Storage
> Pools. You could finally hold a full day's worth of backups online in them,
> which is an ideal situation as far as managing migration and copy pool
> operations "by the book". This might even make client backups run faster.
> RAID5 would protect this data from media failure, so you don't need to worry
> about having only one copy of it for a while. Another good use: Set up a
> Reclamation Storage Pool in it, which will free up a tape drive and
> generally speed reclamation. Tape volumes are getting huge these days, so
> you could use this kind of massive storage, optimized for sequential
> operations, very beneficially for this.
>
> So, to summarize, your investment in the SAN-attached Black Box O' Disk
> Space is still good, for everything you probably planned to put in it,
> EXCEPT for the TSM Database. That's only 36GB in your case, so leaving it
> out of the Big Box is removing only 2% of it. If the other 98% works well,
> the people who funded it should be happy.
>
> P.S. I'm preparing a presentation for Share in Dallas next spring on this
> exact topic; I really appreciate interesting data points like this. Thank
> you for sharing it.
>
> Roger Deschner      University of Illinois at Chicago     rogerd AT uic DOT 
> edu
>
>
> On Sat, 31 Aug 2002, Eliza Lau wrote:
>
> >I recently moved the 36G TSM database and 10G log from attached SCSI
> >disk drives to a SAN. Backing the db now takes twice as long as it used
> >to (from 40 minutes to 90 minutes).  The old attached disk drives are
> >non-RAID and TSM mirrored.  The SAN drives are RAID-5 and TSM mirrored.
> >I know I have to pay a penalty for writing to RAID-5.  But considering
> >the massive cache of the SAN it should not be too bad.  In fact,
> >performance of client backups hasn't suffered.
> >
> >However, the day after the move, I noticed that backup db ran for twice
> >as long.  It just doesn't make sense it will take a 100% performance
> >hit from reading from RAID-5 disks.  Our performance guys looked at the
> >sar data and didn't find any bottlenecks, no excessive iowait, paging,
> >etc. The solution is to move the db and log back to where they were.
> >But now management says: "We purchased this very expensive 2T IBM SAN
> >and you are saying that you can't use it." Meanwhile, our Oracle people
> >happily report that they are seeing the performance of their
> >applications enjoy a 10% increase.
> >
> >Has anyone put their db and log on a SAN and what is your experience? I
> >have called it in to Tivoli support but has yet to get a callback. Has
> >anyone noticed that support is now very non-responsive?
> >
> >server; AIX 4.3.3,  TSM 4.2.1.15
> >
> >Thanks,
> >Eliza Lau
> >Virginia Tech Computing Center
> >1700 Pratt Drive
> >Blacksburg, VA 24060
> >lau AT vt DOT edu
> >
>