Re: Help with TSM server being hammered by clients

Can you provide a little more information on the layout of the disk
subsystem for the TSM DB, recovery log and storage pools? I recently close
another customer PMR that was very similar to this. After redoing the
layout for optimal performance for the DB and LOG, the problem went away.
For example:

        1. What is the disk subsystem?
        2. How many DBVOLS? LOGVOLS, STGPOOL VOLS?
        3. Filesystem format - JFS/RLV/???
        4. Is it RAID? What type?
        5. Are the TSM volumes sharing space with other data?
        6. Are the TSM volumes separated by type?
        7. How many physical spindles do you have for the TSM volumes?
        8. What is your cache hit %?
        9. Are you using disk caching? (On both the client and server)
At 01:51 PM 11/4/2003 -0600, you wrote:

        TSM Server 4.2.4.1 (Solaris)
        TSM Client 4.2.3 (Solaris)

        TSM DB 83GB (40% util)
        TSM Log 4.6GB

        I am having a serious problem with 4 Solaris clients hammering
the server during their backups.  Each client has a lot of file held on
TSM, about 4-5 million per node and growing, though not much data, only a
couple of hundred GB's per node.  The server is very responsive when
these clients are not backing up, all other backups run without lagging
the server.  I have about 120 nodes backing up to this server daily.
        What could be causing this performance problem?  Here is a show
logpin during last nights backup:

tsm: I02SV1000>show logpin
Dirty page Lsn=4675033.188.3116, Last DB backup Lsn=4677956.167.3489,
Transaction table
Lsn=4677883.231.3853, Running DB backup Lsn=0.0.0, Log truncation
Lsn=4675033.188.3116
Lsn=4675033.188.3116, Owner=DB, Length=128
Type=Update, Flags=C2, Action=ExtDelete, Page=6110475, Tsn=0:180594521,
PrevLsn=4675033.180.2739,
UndoNextLsn=0.0.0, UpdtLsn=4675033.176.827 ===> ObjName=AF.Bitfiles,
Index=12, RootAddr=29,
PartKeyLen=1, NonPartKeyLen=7, DataLen=20
The recovery log is pinned by a dirty page in the data base buffer pool.
Check the buffer pool
statistics. If the associated transaction is still active then more
information will be displayed
about that transaction.
Database buffer pool global variables:
CkptId=25232, NumClean=269056, MinClean=393192, NumTempClean=393216,
MinTempClean=196596,
BufPoolSize=393216, BufDescCount=432537, BufDescMaxFree=432537,
DpTableSize=393216, DpCount=124149, DpDirty=124149, DpCkptId=21890,
DpCursor=92805,
NumEmergency=0 CumEmergency=0, MaxEmergency=0.
BuffersXlatched=0, xLatchesStopped=False, FullFlushWaiting=False.

Is the large number of DpDirty pages bad?  I think so, but I don't know
the techincal details behind this value.  The log is at 0% util when
backups start during the evening and by midnight last night, the log was
up to 80% and climbing rapidly.  Once I cancel these 4 clients from
backing up, the log stops filling so rapidly.  Does anyone else have
problems with clients that have large numbers of small files?  How do you
handle backing them up?  It seems like these nodes take 8-10 hours a piece
which seems very slow.

Thanks in advance for any assistance that you can provide!

Michael French
Savvis Communications
IDS01 Santa Clara, CA
(408)450-7812 -- desk
(408)239-9913 -- mobile


Dave Canan
TSM Performance
IBM Advanced Technical Support
ddcanan AT us.ibm DOT com