Re: backup performance with db and log on a SAN

ESS is a complex piece of equipment and as any such one can be configured
in zillions of ways - some good, some bad and some of them perfect. With
proper configuration of the *whole* system you can get impressive results.
Check the settings of the AIX, FC HBAs, zoning and mainly the ESS LUNs. It
would really help if you know internal structure and StorWatch Expert can
also help a lot (Daniel, StorWatch Specialist is for configuration only).
What you got is insufficient from the HW you have and others already
pointed - Paul is backing the TSM DB much faster and only 10% improvement
for Oracle is also simply not enough.
Some hints:
- are your DB & log using two LUNs over same LSS? TSM is attempting to
parallelize access but it ends back over same RAID-5 array inside ESS.
Spread the load over more arrays. If you can, redefine 4 DBVs in different
LUNs in *different* LSSes. Fill the rest of LUNs with file pool, disk pool
or anything less used.
- what is using the other 159 GB (205 - 36 for DB - 10 for log)? If Oracle
is using the same LSS heavily your DB read speed might suffer.
- where the Oracle is running? If on the same system as TSM server (I
doubt) the system might be overloaded easily. If not - only two ports for
2 TB Shark can be the bottleneck.
- use topas or monitor tools and watch the reading speed from the disks
during TSM DB backup. Do not count the throughput twice - you will see
vpath and hdisks (18 MB/s on hdiskA + 19 MB/s on hdiskB + 37 MB/s on
vpathAB is *not* 74 but only 37 MB/s).

And some comments:
--> What kind of a machine are you running? <--
Daniel is completely right. Each of FC HBAs and Gigabit Ethernet would
load the PCI bus the are connected to alone. If two of them are put in
slots on the same bus the latter is overloaded andperformance will suffer.
Count also SCSI RAID adapter(s) - 3 or 4 SCSI buses 80 MB/s each.Check how
many adapters you have and how many I/O drawers. While you can put many
adapters in single drawer it might be at the price of performance. I've
heard from some IBMers that adapters can be configured for "Best
operation" (performance) and "Connectivity" (price).
While I wrote this additional post came in:
--> We have a P-Series 660 7026-6H1 with 12 PCI slots. <--
So you have single drawer (with 14 slots actually but 2 are occupied by
IPL disks) which means total of 500 MB/s for the whole drawer and 4 (four)
PCI buses. And I counted 4 FC adapters + one SCSI RAID + unknown number of
gigabit and fast Ethernet + 4 not used SCSI adapters. So your I/O drawer
is almost full and at least one PCI bus is overloaded with 2 FC or FC+RAID
adapters.

--> The tsmserver host has 4 HBAs.  Two are connected to each switch, with one
zoned for tape traffic and the other zoned for disk traffic. <--
--> ... especially if you are using the same HBA for both a SAN Data Gateway
(IBM 3583) or other similar solution and the SAN disk <--
I have seen an IBM document titled IBM LTO Tape Library Sharing and there
is info that sharing of same FC HBA or SAN Data Gateway for both disk and
tape access is not supported. But in the same document there was a clear
statement:
"A number of customers in the past six months have had tape and disk on
the same FC HBA, had problems, separated the tape and disk on separate FC
HBAs, and the problem went away. The exception to the rule is ESS / 3590 /
2109 / pSeries with FC 6227. This configuration has been tested and is
supported."
Many but not all - for example we've implemented for a customer nice
working TSM implementation where a MgSysSAN node is reading from ESS and
writes to 3583 with good 32-34 MB/s.
So either this document is wrong/obsolete/whatever or for the
configuration Eliza has it ought to be possible to share HBAs for disk and
tape *without* any performance penalty. My personal experience also up to
now is proving this can be done even with LTO. Fiber-attached 3590 drives
should work not only because I think so but also because IBM wrote it.

--> IBM helped set up the Shark and the switches.  I assumed that the guy knew
what he was doing. <--
Do not assume this for granted! Unfortunately as in any big company there
are many people at different levels of knowledge. Same as the work they do
- some good, some bad and some of them perfect. If every IBMer was perfect
their competition would be no more on the market.
In the light of the facts (not enough PCI slots for TSM server
performance, not enough ESS ports, etc.) I would not say those guys knew
what *exactly* were doing.

Sorry for the very long post. I thought there are too many facts
influencing the throughput to be commented.


Zlatko Krastev
IT Consultant





Please respond to "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>
Sent by:        "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>
To:     ADSM-L AT VM.MARIST DOT EDU
cc:

Subject:        backup performance with db and log on a SAN

I recently moved the 36G TSM database and 10G log from attached SCSI disk
drives to a SAN. Backing the db now takes twice as long as it used to
(from 40 minutes to 90 minutes).  The old
attached disk drives are non-RAID and TSM mirrored.  The SAN drives are
RAID-5 and TSM mirrored.  I know I have to pay a penalty for writing to
RAID-5.  But considering the massive cache of the SAN it should not be
too bad.  In fact, performance of client backups hasn't suffered.

However, the day after the move, I noticed that backup db ran for twice
as long.  It just doesn't make sense it will take a 100% performance hit
from reading from RAID-5 disks.  Our performance guys looked at the sar
data and didn't find any bottlenecks, no excessive iowait, paging, etc.
The solution is to move the db and log
back to where they were.  But now management says: "We purchased this
very expensive 2T IBM SAN and you are saying that you can't use it."
Meanwhile, our Oracle people happily report that they are seeing
the performance of their applications enjoy a 10% increase.

Has anyone put their db and log on a SAN and what is your experience?
I have called it in to Tivoli support but has yet to get a callback.
Has anyone noticed that support is now very non-responsive?

server; AIX 4.3.3,  TSM 4.2.1.15

Thanks,
Eliza Lau
Virginia Tech Computing Center
1700 Pratt Drive
Blacksburg, VA 24060
lau AT vt DOT edu