Here's some more details as promised....thanks for your patience.
Please spread the good news!
Cyndie Behrens (IBM San Jose)
======================================================================
On Monday June 23, 1997, IBM announced a new record for Oracle backup
and restore performance using ADSTAR Distributed Storage Manager
(ADSM) for AIX, Please see the press release on the ADSM web site:
http://www.storage.ibm.com/adsm. The information in the press release
and in the following Questions and Answers was accurate as of the
June 1997 benchmark completion date.
Here are some Questions and Answers to help you better understand:
o The benchmark results
o The configurations used for the benchmarks
o The factors that affected the performance results
o Why these results are record-breaking in the industry
o When more information will be available publicly to customers and
internally to IBMers
A performance whitepaper is planned by the end of July 1997 and will be
published on the ADSM web site. Additional information can be
obtained from IBM.
1) Why was this VLDB benchmark performed?
Customers are relying more and more on databases for their critical
data. Storage management solutions that address a variety of
fast backup and recovery scenarios are mandatory. IBM wanted to
demonstrate how the combination of key IBM hardware and software
solutions, along with backup and recovery solutions provided by
Oracle Corporation, solve VLDB management issues today.
2) What organizations were involved in this benchmark?
The organizations involved with this benchmark included:
IBM's RISC System/6000 Division, IBM's Storage Systems Division
(with ADSM, 7133 serial disk drives, Magstar 3590 tape drives,
and a 3494 Tape Library), IBM's Teraplex Integration Center,
and the groups at Oracle Corporation who develop and support
Oracle Parallel Server (OPS) and Oracle Enterprise Backup
Utility (EBU).
3) I understand that the benchmarks were conducted at the IBM RS/6000
Teraplex Integration Center. What is this center?
IBM is one of the first companies to provide large-scale integration
testing and verification facilities focused on data warehouse,
data mart, and data mining environments. IBM's Teraplex
Integration Centers have been designed to integrate, optimize, and
stress-test very large business intelligence systems and
applications. These centers address the market's increasing
reliance on very large databases for business critical operations.
IBM's RS/6000 and S/390 Teraplex Integration Centers are located
in Poughkeepsie, NY, and the AS/400 Teraplex Integration Center
is located in Rochester, Minnesota.
4) What were the record-breaking performance results?
We believe several aspects of the results were record-breaking:
o This was one of the largest databases used for Oracle backup and
restore measurements, ranging from 62GBs to 744 GBs, depending on
the test. The results achieved, for example 736 GBs backed up in
less than 1 1/2 hours and restored in less than 2 hours, were real
measurements, not extrapolated or theoretical numbers, as have been
used on occasion by our competition.
o The rates were wall clock rates, that is, total elapsed time for
the operation, not a maximum data transfer rate. The wall clock
rates were the real time it took for the operation, including
the time for mounting the tapes.
o The restore rates were very comparable to the backup rates; they
were all within 15%. And some restores were actually faster
than the backups!
5) What is the difference between extrapolated numbers and real
measurements?
If, for example, a test was done backing up a 100 GB database, and
the test took 30 minutes, the extrapolated rate would be 200 GBs
per hour. A 200 GB database was never actually backed up
in one hour, but instead, an assumption was made that is not
necessarily true; that is, if a 100 GB database was backed up
in 30 minutes, you multiply by two to get an hourly backup rate.
Extrapolated results, in theory, can only provide the best case
estimation because they assume linear results can be achieved
as the size of the environment grows, but they ignore the reality
of the resource costs that accompany managing a larger environment.
Only by running the full length test can you determine the true
results. That's why IBM did it! These IBM published benchmarks
are real measurements. We actually took a 736GB database and
measured how long it took to back it up and how long it took to
restore it. We know that the interactions between the various
hard ware and software components all worked at a very fast rate,
even with such a large database.
6) What is the difference between total elapsed time (wall clock time)
and maximum data transfer rates?
The results IBM published show the actual duration of the
backup and restore. If we started the backup at 10:00 am and it
finished at 11:30 am, then the wall clock rate is 1 1/2 hours.
This wall clock rate includes all activities required to complete
for the backup or restore operation to be successful, including all
processing time and tape mount time.
Maximum data transfer rates, and other similar semantics for
"burst" or "peak" rates, are rates that are achieved at one
snapshot in time. It is not a measure of how long an
operation takes to complete from start to finish.
Think of a marathon runner who runs over 26 miles. He/she may run
some miles in four minutes (peak rate) but what counts is how long
it takes from start to finish (total elapsed time). You would not
let the runner run 13 miles and then just multiply by two to get
his/her score!
7) What levels of software were used in these benchmarks?
These benchmarks were run with:
o AIX 4.1.5
o Parallel System Support Programs (PSSP) 2.2
o Oracle Parallel Server (OPS) 7.3.2.3
o Oracle Enterprise Backup Utility/Parallel Version (EBU/PV)
Version 2.0.12.4.1
o ADSM V2 AIX client 2.1.6
o ADSM V2 AIX server 2.1.5.12 and 2.1.5.13
o ADSMConnect Agent for Oracle on AIX
8) What hardware was used in these benchmarks?
The hardware used in these benchmarks included:
o An IBM RS/6000 Scalable POWERparallel System (RS/6000 SP)
o IBM 7133 Serial disk drives
o IBM Magstar 3590 tape drives
o A 3494 Tape Library
9) What communication protocols were used in these benchmarks?
Tests were completed with both shared memory and TCP/IP. All
communications occurred over the SP switch. In addition, the
EBU client performed all of it's data read and write operations
using virtual shared disk (VSD) read/write protocols over the
SP switch.
10) What configuration was used in the benchmarks?
OPS backup and restore was tested in a variety of configurations
which included up to 16 nodes of an SP, and up to 16 3590 tape
drives. Eight of the 3590 were housed in a 3494 Tape Library.
The other eight were stand-alone 3590s with Automated Cartridge
Facilities (ACFs). All tape handling was automated. All
evaluations were conducted within the framework of the SP. A
variety of ADSM server and client node configurations were evaluated.
11) How many tape drives were used in total and per SP node?
Up to 16 tape drives were used, with one to four tape drives
per ADSM server node.
12) What processors were used in the SP?
Different SP node configurations were used for each evaluation
including:
o Eight 67 MHz Power2 thin nodes
o Sixteen 120 MHz P2SC thin nodes
o Sixteen 8-way 112 MHz PowerPC 604 high nodes
13) Did you use any special software setup or tuning parameters?
We used the latest ADSM tuning parameters, including large buffers,
the SP switch and it's settings, a specific physical database
layout on which EBU read/wrote data, different file sizes, and
tape compression, which all helped drive the 3590 tape drives at
a very high rate.
14) These measurements were for OPS. My customer doesn't have OPS, but
instead has non-OPS Oracle7 databases which are not running on
an SP. What results can I expect?
An accurate estimate of potential performance would be that
which is achievable by the ADSM client running in the same
environment processing like-sized files. Keep in mind, however,
that with EBU you can run multiple backup and restore streams,
each of which would be an ADSM client session.
15) You mention that some of the measurements used up to 16 ADSM
servers but I've seen results published by other vendors that
indicate only one server was used in their environments.
None of the published reports we have seen to date
mention the number of servers used. In any case, EBU/PV has
special support to manage multiple ADSM servers transparently.
EBU/PV can send data to multiple ADSM servers simultaneously,
and restore the data from the appropriate server.
16) When will the EBU/PV function be available for environments
other than OPS on the SP2?
EBU/PV is available today for OPS in the SP2 environment.
The EBU/PV function is planned to be incorporated into the base
EBU code with EBU 2.2, which Oracle targets for an August
availability. This would make EBU/PV functions, such as the
transparent management of multiple ADSM servers, available in
non-OPS environments as well as in OPS environments with other
hardware configurations (for example, clusters of RISC System/6000s).
17) I understand that EBU not only provides multiple parallel data
streams for backup and restore, but also multiplexes data from
multiple disks to each of the data streams, if you choose to
configure it to do so. How much of a factor was multiplexing
in your ability to drive the 3590s at such a fast average data
transfer rate of 9 MBs/second?
In theory, the greatest benefit from multiplexing is when
slow multiple client disks can be read from simultaneously, and
then combined into a single data stream which can then get written
sequentially to a fast tape device on the ADSM server. That is,
multiplexing is supposed to aid in "speeding up" the slower device.
While we did get some benefit from multiplexing, the effective disk
read request rates did not scale linearly as more disks were
accessed in parallel because they were not a performance bottleneck
in our environment, and in fact, we actually hurt throughput
performance when an inefficient multiplexing strategy was used.
The biggest factors in achieving the high data transfer rates to the
3590s were using:
o ADSM's large buffer support
o The SP switch and setting its parameters appropriately
o A physical database layout on which EBU read/wrote the data
o Appropriate file sizes
18) Did you use ADSM compression?
ADSM compression is done on the ADSM client and is of most value
when you have a slower network and you want to reduce the amount
of data you send across the network. We did not use ADSM
compression because we had a fast network. We did use the 3590
tape compression; tape compression is done after the data is
sent through the network.
19) What was the average CPU utilization for the benchmarks?
The CPU utilization varied depending on the configurations of the
tests, but was as low as 20%.
20) What happened when you added more tape drives to the ADSM server
nodes?
Excellent scaling characteristics of the solution allowed flexibility
in meeting customer needs. We measured linear scalability when a
second drive was added. Additional performance was achieved by
adding more tape drives. With three or four tape drives we continued
to see valuable throughput gains.
21) What if I am running Oracle on a different platform, such as
Sun or HP?
The architecture of the hardware and operating system is a key
factor in your performance results. We are looking into making
measurements on other platforms.
22) Backup rates are important, but what I really care about are restore
rates. How did your restore rates compare to your backup rates?
Our restore rates were highly comparable to our backup rates;
in fact the restore rates were consistently within 15% of our
backup rates. Some restores were even faster than the backups.
We are unaware of any competitive results that are even close to
this level of performance!
23) Did you need to use 3590's to achieve these performance results?
A key performance factor is the type of tape drives you use,
especially when you back up directly to tape. The 3590s
were key to the results we achieved. Using 3590 tape compression
also improved performance and less tapes were needed to store
the backup data. It is the balance of all products working
together that made our results possible.
24) What can we expect from other VLDB environments, such as the new
IBM DB2 Universal Database Server (UDB) or Oracle in an SAP R/3
environment?
We are currently performing benchmarks with DB2 UDB and
BACKINT/ADSM and plan to publish the results when they are
complete.
25) What should we expect from Oracle 8 in terms of backup and recovery
performance?
Oracle provides a new and enhanced backup and recovery facility,
Recovery Manager (RMAN) for Oracle 8 databases. EBU will
continue to be the facility to use for Oracle 7 databases. The
interface from RMAN to ADSM is expected to be identical to the
interface from EBU to ADSM. There are some RMAN enhancements that
may improve throughput, such as their new true incremental
support. With true incremental support, both backup and recovery
may show performance improvements.
26) Were the test databases fully populated?
No, our databases were about 80% full with representative data
because we wanted to test a typical Oracle customer environment.
27) Any other advice on how to configure a real customer environment?
Make sure you are using the most recent levels of the software and
device drivers.
28) Were these benchmarks made using ADSM V2 or V3?
Do you expect differences with V3?
These benchmarks were made using ADSM V2. General V3 performance
testing is in progress. Specific V3 testing with Oracle is
under consideration.
29) Did using PTF12 or PTF13 for the ADSM V2 server make
a difference?
In our environment, we saw no performance difference between
PTF12 and PTF13, but this does not mean that this will be the
case for all environments.
30) Were all the measurements done straight to tape, or were the
7133 serial disk drives used as an ADSM disk pool first and
then later migrated to tape?
All measurements were straight to tape.
31) What can you say about the linear scalability when adding
additional ADSM server nodes?
Performance was very linearly scalable when we added additional
ADSM server nodes. In general, adding additional ADSM server
nodes improves performance and distributes the CPU load across
multiple servers.
|