[ADSM-L] TSM V6 Big DB2 Database Disk Layout

We're in the process of installing TSM 6.2 on a new system, to replace
our current TSM 5.5 system which has a 250gb database. We're going to
use the export-import method of migration, to eliminate the possibly
long downtime of an uprgade-in-place. It was time to update the old
hardware anyway. I'm counting on the database being about 350gb on TSM6
after the migration is complete. I'm at the point of allocating space
for the DB2 database, which I want to do very carefully for such a big
TSM system. A backup/restore to change the layout would involve
unacceptable downtime.

Up until now, the conventional wisdom for the TSM6 DB2 Database went
something like, "Make 4 containers, or perhaps 8. Put each on a separate
LUN." It also says that you should allocate space now, at the start, for
how large you think the database might eventually get. That was about
all that was said, other than the obvious things like using your fastest
disks, not sharing the disks/LUNs/Unix Filesystems with anything else,
and other general disk performance motherhood and apple pie. This had me
slicing and dicing big RAID arrays in my SAN box into lots of little
LUNs. Bad idea.

Then along came this recent presentation "DB2 for TSM Administrators"
http://www.ibm.com/support/docview.wss?uid=swg27020613&myns=swgtiv&mynp=OCSSGSG7&mync=E
which has a new recommendation to use just 1 huge container, in a big
RAID5 or RAID10 array. The thing that was made clear by this
presentation is that if you're using a SAN box which divides up space in
such a way that a single physical disk might contain parts of more than
one LUN, you're hurting performance. This is common with SAN systems
such as IBM DS that let you define multiple LUNs per RAID array. With
DB2 striping across containers, you could wind up striping across parts
of the same physical drive, which would be very bad thrashing. I was
glad that this presentation pointed this out, by simply recommending one
container per RAID array, rather than one container per LUN. This makes
sense.

But I've got a big problem with all this. There's very little
flexibility here. You have to employ too much clairvoyance, and allocate
as much space initially for your database as it will ever need. Sorry,
my crystal ball fails me here. Containers should be roughly the same
size, and when you add one there may be hot spots and other forms of I/O
imbalance. Growth can be disruptive. When some Very Important Faculty
Member walks in and demands to backup 10 million files, but the database
is already full, I cannot take the downtime to backup and restore the
entire database just to rebalance I/O. I need to figure out how to grow
it incrementally and non-disruptively - like with TSM5.

So I'm thinking of going in a completely different direction. Lots of
little containers. You can have as many as 128. The "TSM V6.1 Technical
Guide" redbook says that DB2 uses striping across containers. So I'm
thinking of using pairs of 36gb 15Krpm disks to make 36gb RAID1 (simple
mirroring) arrays for small containers, and let DB2 do the striping.
I'll only use as many as I need at the time. Either the SAN or AIX LVM
can do the mirroring. Then if it grows, I can add more of these smaller
containers as needed, one at a time. When I add containers, there will
still be hot spots and an I/O imbalance, but it should be less severe
and of shorter duration, because the new container will be relatively
small. And it will eliminate the consideration for matching stripe sizes
between the RAID box and DB2. This way I can add disks as needed, when
they're actually needed, much like the TSM5 B-tree database. With 36gb
containers there's a theoretical limit of 4608gb here, but I'll need to
subdivide into separate instances for other reasons long before
approaching that size.

It says that adding a container causes some kind of a reorg which can
consume resources. How bad is the performance impact of that reorg? If
the newly added container is relatively small, will that reduce the
resources needed for this reorg? Even if it's a large impact, I could
still do it on a weekend.

Is this concept of using lots of little RAID1 containers for the TSM6
DB2 database a good idea or not?

Roger Deschner      University of Illinois at Chicago     rogerd AT uic DOT edu
==== "Research is what I'm doing when I don't know what I'm doing." ====
========================= -- Wernher von Braun =========================