ADSM-L

Re: AIX TSM performance improvement (was Re: OS390 TSM Performance questions.)

2003-02-17 08:24:03
Subject: Re: AIX TSM performance improvement (was Re: OS390 TSM Performance questions.)
From: PAC Brion Arnaud <Arnaud.Brion AT PANALPINA DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Mon, 17 Feb 2003 14:23:11 +0100
Zlatko,

You said :
>> P.S. I am charging my customers for such advices but hopefully I can
get a beer (or Swiss chocolate) for this one :-)

I promise to send you that chocolate (wich one do you prefer : black,
white, milky, bitter quality ? Just ask, we have lots of varieties ) as
soon as my system will be fine again !
But therefore you should give me a snail-mail address where I could ship
it to ...

I sincerely appreciate your help, and would really be glad if a person
like you could be our TSM consultant ! Unfortunately none of the ones I
met here in Switzerland had half of your skills, and nobody notified me
about messy, faulty, or call it whatever you will, disk configuration !
Best regards.

Arnaud

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
| Arnaud Brion, Panalpina Management Ltd., IT Group     |
| Viaduktstrasse 42, P.O. Box, 4002 Basel - Switzerland |
| Phone: +41 61 226 19 78 / Fax: +41 61 226 17 01       | 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=



-----Original Message-----
From: Zlatko Krastev/ACIT [mailto:acit AT ATTGLOBAL DOT NET] 
Sent: Monday, 17 February, 2003 2:52
To: ADSM-L AT VM.MARIST DOT EDU
Subject: AIX TSM performance improvement (was Re: OS390 TSM Performance
questions.)


Arnaud,

--> 6h1 machine, 2Gb memory ...
--> ... I increased bufpoolsize from 151552 to 524288 (where my 
--> performance
problems began)!

512k pages, 4 kB each equals 2 GB. So you left no space for AIX, file
buffers, TSM code, etc. Consequences: excessive paging and performance
degradation.

--> Vmtune settings : -p10 -P40

This means file buffers will occupy 10-40% of the real memory. Count it
50% with AIX kernel, TCP/IP buffers and TSM code. The result - TSM data
structures (mainly DB bufferpool and log befferpool) should not exceed 1
GB. LOWER the buffpoolsize to 262144 !!!

--> (Copy 1)          Status  (Copy 2)
--> /tsmdb/db01.dsm   Sync'd  /tsmdb_m/db01.d
+
--> LV NAME TYPE    LPs PPs PVs     LV STATE        MOUNT
--> lvtsmdb jfs     128 256 8       open/syncd      /tsmdb
--> lvtsmdb_m       jfs     128 128 4       open/syncd      /tsmdb_m
+
--> /tsmdb/db01.dsm ... /tsmdb/db08.dsm

Rodney already pointed you have excessive mirroring. Actually it is not
4-way but 2+1 way (secondary copies are not AIX-mirrored). However the
results are same - both AIX and TSM mirroring are applied sequentially
introducing the sum of all consistency delays (you had not shown the
"mirrorread db", "mirrorread log", "mirrorwrite db" and "mirrorwrite
log" options from dsmserv.opt, so I am assuming the defaults are used).

The discussion raw LVs vs. filesystem dbvols was always short - the
performance benefit on AIX is small (on Solaris is much higher). However
it is worth to try using raw LVs and further reduce file buffering using
"vmtune -p 5 -P 10" (as Rodney already suggested). This may also allow
to raise bufpoolsize to 65-75% of the RAM.

The discussion how many dbvols per HDD was sparkling several times on
this list. I personally am in the group of believers that single dbvol
per HDD is better. The argument is simple - TSM attempts to
"parallelize" the load over many dbvols results disk heads thrashing.
The example in your case:
-       8 dbvols within single filesystem;
-       the filesystem is on 4x2 disks (32 PPs each);
-       let TSM has to write 16 pages;
1. dbvol1 is occupying PPs 1 through 4 on each disk, dbvol2 is on PPs
5-8, etc. 2. TSM server attempt to "parallelize" will write page 1 on
dbvol1, 2 on dbvol2, ..., 8 on dbvol8, 9 again on dbvol1, etc. 3. the
result will be write of pages 1&9 on PP 1 (for dbvol1), pages 2&10 on PP
5 (for dbvol2), ..., and pages 8&16 on PP 29 (for dbvol8). ALL THOSE on
HDD1 !!! What happened to the parallelism :-((( And why the disk should
move the heads back'n'forth.

The medicine (all can be done under load and without restarting AIX
and/or TSM; however some performance impact should be expected): 1.
eliminate AIX mirroring (use rmlvcopy command) 2. rearrange the tsmvg
and free 4 disks (migratepv and reducevg) 3. create new VG from those 4
disks using smaller PP size (more PPs per
PV)
4. create *separate* jfs logs on each disk (mklv -y <lognameN> -t jfslog
<vgname> 1 <pvN>) 5. initialize each log (logform /dev/<lognameN>) 6.
create the filesystem LVs (mklv -y <dbvolN> <vgname> XYZ <pvN>) 7.
create each filesystem with *own* log (crfs -v jfs -d /dev/<dbvolN> -m
/tsm/dbN -A yes -a logname=/dev/<lognameN>). Note the "-a logname"
option of AIX crfs/chfs commands. 8. define *single* big volume on each
filesystem 9. if volumes created in step 8 can be created same size as
existing ones, define new volumes as third copy and delete the first
copy. If not - dbvol delete/migrate ought to be used. 10. delete the
rest of tsmvg and mirror using only one method (use mklvcopy for AIX
mirroring; use extendvg, repeat steps 4-8 and "def dbc" for TSM
mirroring). 11. repeat steps 1-10 for TSM log 12. get rid of tsmvg_m and
tsmvg_log_m. Use the disks for diskpool or add them as third copy within
*the same* mirroring scheme (both AIX and TSM use LVM which allows three
copies).

--> ... and won't be back before Monday

Happy Monday! You've got something to play with for the whole day :-)

Zlatko Krastev
IT Consultant

P.S. I am charging my customers for such advices but hopefully I can get
a beer (or Swiss chocolate) for this one :-)







PAC Brion Arnaud <Arnaud.Brion AT PANALPINA DOT COM>
Sent by: "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU> 14.02.2003
16:08 Please respond to "ADSM: Dist Stor Manager"


        To:     ADSM-L AT VM.MARIST DOT EDU
        cc:
        Subject:        Re: OS390 TSM Performance questions.


Hi Rodney,

The big picture :
System : aix 4.3.3.0 on a 6h1 machine, 2Gb memory, 2 cpu
Vmtune settings : -p10 -P40 -R 256 -F376 -W256 -s1
TSM version : 4.2.3.1
Bufpoolsize : 524288
Iostat:
tty:      tin         tout   avg-cpu:  % user    % sys     % idle    %
iowait
          0.0         25.9              55.0     13.0       10.9
21.1

Vmstat :
kthr     memory             page              faults        cpu
----- ----------- ------------------------ ------------ -----------
 r  b   avm   fre  re  pi  po  fr   sr  cy  in   sy  cs us sy id wa
 3  3 288460 112615   0   0   0 1220 2288   0 2034 3080 369 55 13 11 21

>q dbvol

Volume Name       Copy    Volume Name       Copy    Volume Name
Copy
(Copy 1)          Status  (Copy 2)          Status  (Copy 3)
Status
----------------  ------  ----------------  ------  ----------------
------
/tsmdb/db01.dsm   Sync'd  /tsmdb_m/db01.d-  Sync'd
Undef-
                           sm
ined
/tsmdb/db02.dsm   Sync'd  /tsmdb_m/db02.d-  Sync'd
Undef-
                           sm
ined
/tsmdb/db03.dsm   Sync'd  /tsmdb_m/db03.d-  Sync'd
Undef-
                           sm
ined
/tsmdb/db04.dsm   Sync'd  /tsmdb_m/db04.d-  Sync'd
Undef-
                           sm
ined
/tsmdb/db05.dsm   Sync'd  /tsmdb_m/db05.d-  Sync'd
Undef-
                           sm
ined
/tsmdb/db06.dsm   Sync'd  /tsmdb_m/db06.d-  Sync'd
Undef-
                           sm
ined
/tsmdb/db07.dsm   Sync'd  /tsmdb_m/db07.d-  Sync'd
Undef-
                           sm
ined
/tsmdb/db08.dsm   Sync'd  /tsmdb_m/db08.d-  Sync'd
Undef-
                           sm
ined
>q logvol

Volume Name       Copy    Volume Name       Copy    Volume Name
Copy
(Copy 1)          Status  (Copy 2)          Status  (Copy 3)
Status
----------------  ------  ----------------  ------  ----------------
------
/tsmlog2/log02.-  Sync'd  /tsmlog_m/log02-  Sync'd
Undef-
 dsm                       .dsm
ined
/tsmlog2/log01.-  Sync'd  /tsmlog_m/log01-  Sync'd
Undef-
 dsm                       .dsm
ined
/tsmlog2/log03.-  Sync'd  /tsmlog_m/log03-  Sync'd
Undef-
 dsm                       .dsm
ined
/tsmlog2/log04.-  Sync'd  /tsmlog_m/log04-  Sync'd
Undef-
 dsm                       .dsm
ined
/tsmlog2/log05.-  Sync'd                    Undef-
Undef-
 dsm                                         ined
ined
/tsmlog2/log06.-  Sync'd                    Undef-
Undef-
 dsm                                         ined
ined

Some info about disk layout :

tsmvg:
LV NAME             TYPE       LPs   PPs   PVs  LV STATE      MOUNT
POINT
loglv00             jfslog     1     2     2    open/syncd    N/A
lvtsmdb             jfs        128   256   8    open/syncd    /tsmdb
lvtsmdb1            jfs        128   256   8    open/syncd    /tsmdb1
tsmvg_log_m:
LV NAME             TYPE       LPs   PPs   PVs  LV STATE      MOUNT
POINT
lvtsmlog_m          jfs        96    96    3    open/syncd    /tsmlog_m
loglv05             jfslog     1     1     1    open/syncd    N/A
tsmvg_m:
LV NAME             TYPE       LPs   PPs   PVs  LV STATE      MOUNT
POINT
lvtsmdb_m           jfs        128   128   4    open/syncd    /tsmdb_m
loglv01             jfslog     1     2     2    open/syncd    N/A
tsmvg_log:
LV NAME             TYPE       LPs   PPs   PVs  LV STATE      MOUNT
POINT
lvtsmlog2           jfs        96    192   6    open/syncd    /tsmlog2
loglv04             jfslog     1     2     2    open/syncd    N/A

All volumes for TSM db and logs are striped.
What I'm experiencing is very high cpu usage (mini 50 %, max 99), and
paging as soon as backups or expire inventory are started. Also Cache
Hit Pct is low (98.69) although I increased bufpoolsize from 151552 to
524288 (where my performance problems began)! Expire inventory needs
more or less 20 hours to explore approx 7 million objects, and to delete
8-9 % of them ... If you have a clue on wat is happening here, I'm taker
;-) (Take your time, I'll leave for W.E. in a couple of hours, and won't
be back before Monday). Thanks in advance.

Arnaud

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
| Arnaud Brion, Panalpina Management Ltd., IT Group     |
| Viaduktstrasse 42, P.O. Box, 4002 Basel - Switzerland |
| Phone: +41 61 226 19 78 / Fax: +41 61 226 17 01       |
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=



-----Original Message-----
From: Rodney clark [mailto:Rodney.Clark AT INGBANK DOT COM]
Sent: Friday, 14 February, 2003 13:33
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: OS390 TSM Performance questions.


Post us some details iostat vmstat and how much memory disks e.t.c. The
big quick win on AIX is vmtune -p5 -P10 But I guess you a.ready know
that.


-----Original Message-----
From: PAC Brion Arnaud [mailto:Arnaud.Brion AT PANALPINA DOT COM]
Sent: Friday 14 February 2003 09:44
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: OS390 TSM Performance questions.


Hi all,

I followed your discussion with much interest, as I'm suffering from
huge performance problem problem too. Unfortunately I'm not under OS390,
but using AIX 4.3.3 : could someone tell me if there is some some trick
like this one, that should be considered, when using this OS ? Another
thing that annoys me : using "show memu SHORT" on my server (TSM
4.2.3.1) returns : ANR2000E Unknown command - SHOW MEMU
Could it be that this command is only available for OS390 TSM version ?
Thanks in advance.

Arnaud =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
| Arnaud Brion, Panalpina Management Ltd., IT Group     |
| Viaduktstrasse 42, P.O. Box, 4002 Basel - Switzerland |
| Phone: +41 61 226 19 78 / Fax: +41 61 226 17 01       |
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=