ADSM-L

Re: Troubleshooting performance issues

2002-11-09 14:13:59
Subject: Re: Troubleshooting performance issues
From: DFrance <DFrance-TSM AT ATT DOT NET>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Sat, 9 Nov 2002 13:10:46 -0600
I like all what you two said;  simply turn on the "perform" client trace
opt, specify output file, post the results --- it's a small file of output,
less than 50 lines --- then we can really discuss the issue. (BTW, "perform"
will include client_instr_detail along with time-stamps and client options
in force.)

Simply add the following two lines to the dsm.opt file (for a couple backup
cycles on a couple of the problem nodes):
    tracefile    perf-trc.out
    traceflag    perform

I cannot find the old "Prob. Determ. Guide" that has much detail on how to
interpret this client trace, but the line-item categories are clear enough
to tell the difference between client, network and server.

Learn more about tuning from the book at the following link -- the
long-awaited and longed-for (updated to 4.2) version of the original
classic... this is WAY cool (TomH, thanks for the off-list copy you sent, as
well!)

http://www.tivoli.com/support/public/Prodman/public_manuals/td/TSMC/sm42tune
/en_US/HTML/TSMV4.2TuningGuide.html

also, at this location...

http://www.tivoli.com/products/solutions/storage/docs/tsm-tuning.html


Don France
Technical Architect -- Tivoli Certified Consultant
Tivoli Storage Manager, WinNT/2K, AIX/Unix, OS/390
San Jose, Ca
(408) 257-3037
mailto:don_france AT ayett DOT net (change aye to a for replies)

Professional Association of Contract Employees
(P.A.C.E. -- www.pacepros.com)



-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU]On Behalf Of
Gianluca Mariani1
Sent: Wednesday, November 06, 2002 4:36 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: Troubleshooting performance issues


try also:


BufPoolSize 81920 --> set to 10% of RAM
UseLargeBuffers Yes ---> set to NO
MoveBatchSize 256 ---> set to 1000

for the rest Zlatko said it all, especially about the tapes. maybe you can
run an instr_client_detail trace to take a look at where the bottleneck
might be.
what about client options?

Cordiali saluti
Gianluca Mariani
Tivoli TSM Global Response Team, Roma
Via Sciangai 53, Roma
 phones : +39(0)659664598
                   +393351270554 (mobile)
gianluca_mariani AT it.ibm DOT com
----------------------------------------------------------------------------
------------------------

"The people of Krikkit,are, well, you know, they're just a bunch of real
sweet guys, you know, who just happen to want to kill everybody. Hell, I
feel the same way some mornings..."



             Zlatko
             Krastev/ACIT
             <acit@ATTGLOBAL                                            To
             .NET>                  ADSM-L AT VM.MARIST DOT EDU
             Sent by: "ADSM:                                            cc
             Dist Stor
             Manager"                                                  bcc
             <ADSM-L AT VM DOT MARI
             ST.EDU>                                               Subject
                                    Re: Troubleshooting performance issues

             06/11/2002
             11.20


             Please respond
             to "ADSM: Dist
              Stor Manager"






---> TCPWindowsize 64512

TSM Reference Manual:
"TCPWINDOWSIZE
Specifies, in kilobytes, the amount of receive data that can be buffered
at one time on a TCP/IP connection.
...
3. A window size larger than the buffer space on the network adapter might
degrade throughput due to resending packets that were lost on the adapter.
"
Try "tcpwindowsize 256" instead

---> 3 IDE drives

Is the DB on separate drive? How many volumes ?!?! What is the drive and
what is the DB cache hit ratio?
IDE drives perform terrible if you issue more than one operation against
them - commands are queued in the device driver or controller but the
drive is executing them one-by-one. If you have master+slave drive on same
IDE channel - one drive have to wait until the other finishes its
operation and frees the IDE bus.
Do yourself a favour - buy one or two SCSI drives. It is a server on the
end.

---> 3 drives.  OS - DB - LOG

Your network is 100 Mb/s (~= 10 MB/s). The drives in 3583 are LTO, right
(15MB/s). You wrote 3 drives (and small diskpool or no at all) - so you
are attempting to backup direct to tape? Thus your network is unable to
feed the beast quickly enough and LTO has to stop-rewind back-start.
Usually under such circumstances you not only are wearing out to drive but
also are getting 1-3 MB/s (or less) tape-write rate.

---> Network data transfer rate:          570.67 KB/sec

It might be worth to check the network (preferably after paying attention
to above remarks). The TSM server is 100+Full+NoAuto but what about the
client nodes, switch(es)?

Zlatko Krastev
IT Consultant






Etienne Brodeur <ebrodeur AT SERTI DOT COM>
Sent by: "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>
05.11.2002 16:44
Please respond to "ADSM: Dist Stor Manager"


        To:     ADSM-L AT VM.MARIST DOT EDU
        cc:
        Subject:        Troubleshooting performance issues


I have some major performance issues during archive operations.  I also
have bad performance durning backups, but the backup window is large so
they complete without any problems.  I can't seem to put my finger on the
bottleneck and was wondering if someone had a document to recommend when
looking for performance issues?

Here is the scenario:

TSM 4.2.2.10 server on W2K (SP2)
3 IDE drives.  OS - DB - LOG
100 Mbs/Full Duplex (not auto)
LTO 3583 SCSI attached to the server
Server Options:
CommTimeOut 60
IdleTimeOut 30
BufPoolSize 81920
LogPoolSize 512
TxnGroupMax 256
MoveBatchSize 256
MoveSizeThresh 500
UseLargeBuffers Yes
NOBUFPREfetch No
AuditStorage Yes
SELFTUNEBUFpool Yes
SELFTUNETXNsize Yes
TCPWindowsize 64512
TCPNoDelay Yes

There are 10 clients running 4.2.2.x (mostly 4.2.2.0).  I have not set any
performance settings in the DSM.OPTs.  The servers have less than 10 GB of
data each.  I have 5 of them archiving to tape at the same time and they
are all running very slowly:

05-11-2002 01:29:29   ANE4952I (Session: 5848, Node: HY-ORAPED-2000) Total

                       number of objects inspected:  102,322
05-11-2002 01:29:29   ANE4953I (Session: 5848, Node: HY-ORAPED-2000) Total

                       number of objects archived:    86,989
05-11-2002 01:29:29   ANE4961I (Session: 5848, Node: HY-ORAPED-2000) Total

                       number of bytes transferred:     6.46 Go
05-11-2002 01:29:29   ANE4963I (Session: 5848, Node: HY-ORAPED-2000)  Data

                       transfer time:                11,875.56 sec
05-11-2002 01:29:29   ANE4966I (Session: 5848, Node: HY-ORAPED-2000)
Network
                       data transfer rate:          570.67 KB/sec
05-11-2002 01:29:29   ANE4967I (Session: 5848, Node: HY-ORAPED-2000)
Aggregate
                       data transfer rate:        511.71 KB/sec
05-11-2002 01:29:29   ANE4968I (Session: 5848, Node: HY-ORAPED-2000)
Objects
                       compressed by:                    0%
05-11-2002 01:29:29   ANE4964I (Session: 5848, Node: HY-ORAPED-2000)
Elapsed
                       processing time:            03:40:43

This was the fast one, I also have this W2K client:

05-11-2002 07:25:34   ANE4952I (Session: 5906, Node: HY-OSRV-2000)  Total
number
                       of objects inspected:    8,424
05-11-2002 07:25:34   ANE4953I (Session: 5906, Node: HY-OSRV-2000)  Total
number
                       of objects archived:     8,364
05-11-2002 07:25:34   ANE4961I (Session: 5906, Node: HY-OSRV-2000)  Total
number
                       of bytes transferred:     2.76 Go
05-11-2002 07:25:34   ANE4963I (Session: 5906, Node: HY-OSRV-2000)  Data

                       transfer time:                2,871.55 sec
05-11-2002 07:25:34   ANE4966I (Session: 5906, Node: HY-OSRV-2000) Network
data
                       transfer rate:        1,008.95 KB/sec
05-11-2002 07:25:34   ANE4967I (Session: 5906, Node: HY-OSRV-2000)
Aggregate
                       data transfer rate:         97.15 KB/sec
05-11-2002 07:25:34   ANE4968I (Session: 5906, Node: HY-OSRV-2000) Objects

                       compressed by:                    0%
05-11-2002 07:25:34   ANE4964I (Session: 5906, Node: HY-OSRV-2000) Elapsed

                       processing time:            08:17:01

Please help me!  I don't know where to look anymore.

Thanks,

Etienne Brodeur

<Prev in Thread] Current Thread [Next in Thread>