Topas output during Expiration

groener

ADSM.ORG Member
Joined
Dec 14, 2006
Messages
47
Reaction score
0
Points
0
Location
South Dakota
Website
Visit site
Hi all,

I was wondering if anyone would be willing to share "topas" output during their TSM Expire inventory cycle. I would like to see what a decent IO footprint is supposed to look like. Unless you just want to tell me how many disks (or LUNS) you use for the database and what the IO stats are.

I'm trying to rearrange my database into a disk scheme that makes more sense.

Right now the IO total for three disks in raid 0 + TSM mirroring is less than 90 most days and are in 200 KB/s range with the processor at 50-80% wait.

......so I'd like to see what a good working database looks like. :)
 
Expiration does not create a high IO load while searching for files to actually expire and delete. When searching, its mostly a single stream of random read 4KB IO's so no matter how many disks you have you're only going to get as many IOs per second as your overall latency allows. Eg. 10ms random read latency = max 100 IOs per second for a single IO stream, and you may get a bit more or less than that depending on what its really doing.

As long as your CPU is pretty modern, it doesn't really come into it - as you can see on your system with 80% %wait - this is time the CPU is doing no processing.

Only when expiration has found a chunk of files to delete does it hit the db disks with bigger bursts of lots more reads and writes.

Your stats sound pretty normal to me. Assuming this is AIX, high IO %wait doesn't really mean you have a problem, it just means the system is waiting on IOs to complete while the CPU has nothing else to process. The number of CPUs you have also directly affects the %wait value.

To speed up expiration in an environment where it examines a lot more files than it deletes, you need to concentrate on your random read IO latency. So, higher speed disks (eg 15KRPM) is the best thing you can do for this. Also more TSM/AIX/Storage Array cache for more cache hits. If your database is pretty big its next to impossible to get a great cache hit rate during expiration though.

Under AIX using "filemon -o /tmp/filemon.out ; sleep 30 ; trcstop" will give you read/write latency statistics - run this during expiration for 30 seconds or so with the above command and you will get a wealth of information. Its a fantastic tool. You should expect to see your database filesystem 90-100% utilised (not space utilisation, this means there is an IO outstanding to the FS almost all the time), and read response times of around 5-6ms are good whereas >=10 indicate slower disks. With battery-backed-up write behind caching (which I highly recommend for TSM) your write IO latencies should be about 0.1ms. Without write behind caching it will be more like 5-10ms - a big difference.

Also, a database reorg will also usually help because it defrags the expiring.objects table, but you have to keep doing defrags regularly to maintain the benefits. If you don't have a problem, its not worth it.
 
Last edited:
I will try "filemon"

Yes, this is in fact AIX 5.3. I gave "filemon" a quick shot this morning during the database backup. It looks very useful. I'm anxious to try it with Expiration this afternoon. I still feel like I have a bit of "thrashing going on.

I take it there should only be one (maybe 2) .dsm database volumes per disk (or SAN array) for best perfomance? Right now all 12 are on one 3 disk array. I have a feeling that the "filemon will tell me if this is hurting me.

Thanks for your help! I'll let you know how it turns out!
 
Here's a sample

Doesn't look too good......this is half-way through expiration.


Most Active Logical Volumes
------------------------------------------------------------------------
util #rblk #wblk KB/s volume description
------------------------------------------------------------------------
0.18 266144 518072 92.4 /dev/TsmDbv1M /TSMDB/TsmDbv1M
0.17 196536 594672 93.2 /dev/TsmDbv6M /TSMDB/TsmDbv6M
0.16 256848 518072 91.3 /dev/TsmDbv1 /TSMDB/TsmDbv1
0.15 212136 594672 95.1 /dev/TsmDbv6 /TSMDB/TsmDbv6
0.15 164304 499480 78.2 /dev/TsmDbv4M /TSMDB/TsmDbv4M
0.14 224872 437208 78.0 /dev/TsmDbv5M /TSMDB/TsmDbv5M
0.14 240696 437200 79.9 /dev/TsmDbv5 /TSMDB/TsmDbv5
0.14 177136 499472 79.7 /dev/TsmDbv4 /TSMDB/TsmDbv4
0.09 93600 330384 50.0 /dev/TsmDbv3M /TSMDB/TsmDbv3M
0.09 103576 330384 51.1 /dev/TsmDbv3 /TSMDB/TsmDbv3
0.07 94736 207992 35.7 /dev/TsmDbv2M /TSMDB/TsmDbv2M
0.06 82120 207992 34.2 /dev/TsmDbv2 /TSMDB/TsmDbv2
0.03 0 1387224 163.5 /dev/TSMLOGV /TSMLOG/TsmLogV
0.03 0 1387224 163.5 /dev/TSMLOGVM /TSMLOG/TsmLogVM
0.01 51952 16 6.1 /dev/hd6 paging
0.01 0 16824 2.0 /dev/TsmDbv8M /TSMDB/TsmDbv8M
0.01 0 16824 2.0 /dev/TsmDbv9M /TSMDB/TsmDbv9M
0.01 0 16824 2.0 /dev/TsmDbv9 /TSMDB/TsmDbv9
0.01 0 16824 2.0 /dev/TsmDbv11M /TSMDB/TsmDbv11M
0.01 0 16824 2.0 /dev/TsmDbv11 /TSMDB/TsmDbv11
0.01 0 16824 2.0 /dev/TsmDbv10 /TSMDB/TsmDbv10
0.01 0 16824 2.0 /dev/TsmDbv7M /TSMDB/TsmDbv7M
0.01 0 16824 2.0 /dev/TsmDbv7 /TSMDB/TsmDbv7
0.00 0 16824 2.0 /dev/TsmDbv10M /TSMDB/TsmDbv10M
0.00 0 16824 2.0 /dev/TsmDbv12 /TSMDB/TsmDbv12
0.00 0 16824 2.0 /dev/TsmDbv12M /TSMDB/TsmDbv12M
0.00 0 13064 1.5 /dev/loglv12 jfs2log
0.00 0 16824 2.0 /dev/TsmDbv8 /TSMDB/TsmDbv8
0.00 408 42496 5.1 /dev/hd2 /usr
0.00 8 3208 0.4 /dev/hd4 /
0.00 0 2768 0.3 /dev/hd8 jfslog
0.00 0 2232 0.3 /dev/loglv11 jfs2log
0.00 133 1184 0.2 /dev/hd9var /var Frag_Sz.=512



Am I heading in the right direction by breaking up this small RAID 0 array and making one filesystem and one .dsm database file on each disk? 12 filesystems with 12 .dsm database files on one 3 disk RAID0 (AIX striping) seems like it's going nowhere.

The same setup exists on another SCSI controller and holds the TSM mirrored volumes.

They are 36 GB U320 10K internal hard drives. When expiration runs, "topas" tells me the hard disks are 20-30% busy. Seems kinda slow.
 
Like I said before, for expiration its all about latency. While it is allegedly better to have 4-6 db volumes etc, I doubt changing that will help your expiration much.

Having each disk 30% used doesn't mean anything bad in your case - you have multiple disks and at any one point in time expiration will only usually be accessing one or two of them at the same moment. Adding more disks will not increase your expiration performance significantly.

Take a 30 second sample with filemon. Then scroll down to the "Detailed Logical Volume Stats" section and paste the output of the top couple of LV's. We can then interpret those for you. You're particularly interested in the read times(msec) which tells you how fast the disks are responding.

Sample during my expiration on slow-ish disk:

VOLUME: /dev/dblv description: /tsm/server1/db
reads: 1583 (0 errs)
read sizes (blks): avg 9.9 min 8 max 184 sdev 8.8
read times (msec): avg 10.571 min 0.501 max 112.762 sdev 7.996
read sequences: 1538
read seq. lengths: avg 10.2 min 8 max 256 sdev 10.9
writes: 2841 (0 errs)
write sizes (blks): avg 8.1 min 8 max 24 sdev 0.9
write times (msec): avg 1.316 min 0.673 max 77.434 sdev 1.550
write sequences: 2839
write seq. lengths: avg 8.1 min 8 max 24 sdev 1.0
seeks: 4377 (98.9%)
seek dist (blks): init 89072200,
avg 26911194.0 min 8 max 176375952 sdev 24046843.6time to next req(msec): avg 4.504 min 0.000 max 113.148 sdev 6.604
throughput: 971.3 KB/sec
utilization: 0.96
 
sorry it took so long

Here's a sample:

VOLUME: /dev/TsmDbv1M description: /TSMDB/TsmDbv1M
reads: 1243 (0 errs)
read sizes (blks): avg 8.0 min 8 max 8 sdev 0.0
read times (msec): avg 6.631 min 0.257 max 28.050 sdev 5.008
read sequences: 1243
read seq. lengths: avg 8.0 min 8 max 8 sdev 0.0
writes: 2509 (0 errs)
write sizes (blks): avg 8.1 min 8 max 16 sdev 0.7
write times (msec): avg 7.179 min 0.731 max 28.134 sdev 3.654
write sequences: 2500
write seq. lengths: avg 8.1 min 8 max 24 sdev 0.9
seeks: 3743 (99.8%)
seek dist (blks): init 8297888,
avg 1669903.2 min 8 max 16646616 sdev 2860954.9
time to next req(msec): avg 25.331 min 0.001 max 782.958 sdev 39.681
throughput: 158.8 KB/sec
utilization: 0.28

VOLUME: /dev/TsmDbv1 description: /TSMDB/TsmDbv1
reads: 1181 (0 errs)
read sizes (blks): avg 8.0 min 8 max 8 sdev 0.0
read times (msec): avg 5.164 min 0.262 max 23.604 sdev 3.628
read sequences: 1177
read seq. lengths: avg 8.0 min 8 max 16 sdev 0.5
writes: 2509 (0 errs)
write sizes (blks): avg 8.1 min 8 max 16 sdev 0.7
write times (msec): avg 6.417 min 0.705 max 30.780 sdev 3.056
write sequences: 2500
write seq. lengths: avg 8.1 min 8 max 24 sdev 0.9
seeks: 3677 (99.6%)
seek dist (blks): init 8297856,
avg 1682342.5 min 8 max 16646632 sdev 2889079.6
time to next req(msec): avg 25.755 min 0.001 max 795.417 sdev 41.680
throughput: 156.2 KB/sec
utilization: 0.23


This is about 500000 pages into expiration.
 
Looks typical although it seems that when you took this sample, expiration had found some files to delete as there's a lot of writes happening compared to the number of reads. Is it usually writing more than its reading? Hmm going back to your previous output, it looks like it is....

Can you post the actlog entry from when expiration last finished? It tells you how many files it examined and how many it expired.

Your read times of 5-6ms are good, obviously the DB is not on sata :)

However your write times of about 6-7ms show that you are not using a write behind cache. This would improve DB performance a lot during periods when you are writing, by bringing down the write time to more like 0.1ms. Can you use write behind cache? (Only do this if it is protected by a battery backup). As there seems to be more writing happening in your environment, you might also improve expiration performance by adding another couple of disks, particularly if its a raid5 array.
 
Here ya go, The filemon may be a bad sample as Migration had also started. I can get another sample if you like. Anyway, here's actlog:

tsm: ADSM>q actlog begind=-1 search="expire inventory"

Date/Time Message
-------------------- ----------------------------------------------------------
07/11/07 15:39:28 ANR2017I Administrator OPER issued command: EXPIRE
INVENTORY skipdirs=yes (SESSION: 124276)
07/11/07 15:39:28 ANR0984I Process 7606 for EXPIRE INVENTORY started in the
BACKGROUND at 15:39:28. (SESSION: 124276, PROCESS: 7606)
07/11/07 22:29:25 ANR0987I Process 7606 for EXPIRE INVENTORY running in the
BACKGROUND processed 966473 items with a completion state
of SUCCESS at 22:29:25. (SESSION: 124276, PROCESS: 7606)


As you can see by the start and end times, this is reason I want to explore reconstructing the way the DB volumes sit on disk. I've attached a diagram of my "before and planned after" views. I created this because I don't think I have accurately described to people what my layout looks like.

I can add two more disks per mirrored side as you can see by the diagram but probably not controllers with cache. Each side of that IO drawer has it's own SCSI controller.

We could put the DB volumes on the SAN but it would be SATA with small cache and we (my department) have come to a decision to keep the TSM database independent of the SAN for recovery reasons.
 
Back
Top