ADSM-L

Re: Throughput Magic Calculation

2004-09-08 02:47:02
Subject: Re: Throughput Magic Calculation
From: Roger Deschner <rogerd AT UIC DOT EDU>
Date: Wed, 8 Sep 2004 01:47:02 -0500
Mark, you didn't answer my question, but you did provide the key hint,
so thanks anyway. That is to be sure to consider separately, the session
that represents the control thread, from the session that represents the
data thread.

"Date/Time First Data Sent" _is_ SQL variable start_time in the sessions
table. They key is that the control session/thread starts up much
earlier, downloads the list of files from the server, hangs on while the
client program compares lists and decides what to back up today, and
then spawns the data thread when the client is actually ready to send
data. That is why start_time for a data thread is accurate, both for
calculating throughput and for statistical analysis. Q SESSION F=D does
not display start_time for control threads - only data threads - because
it would be misleading in reference to a control thread. So, if you are
using statistical analysis of data from SELECT commands or from
accounting data to calculate throughput, you'll never get meaningful
results if you include control thread sessions. Don't forget to subtract
media wait - the TSM server does in its throughput calculations.

     |---control-----------|
                |--data----|
     A----------B----------C
     time --->

Between A and B, only the control thread exists. The start_time for the
control thread session is A. At B, the data thread starts up and the
data thread's start_time is B. Between B and C data is actually being
sent from client to server, so that's the only period where the concept
of throughput is at all interesting.

And the TSM Server's built-in slow-throughput-session-killer is only
looking at data threads, or else it would inadvertently knock off all
those idle control threads.

So now all this enlightenment raises more questions: How does it tell
control thread sessions from data thread sessions? Is it just looking at
the oldest one as the control thread? How does it keep them properly
matched up, in case a client node starts multiple sessions?

For now, however, I'm starting to sour on the idea of using the server
throughput threshold setting. Even when it's calculated accurately, it's
very inconsistent. Network traffic, system load on the clients, days of
the week, and phases of the moon all seem to have an effect on how fast
a client can back up to a TSM server. It's looking like a really blunt
instrument, when all I'm really trying to accomplish is to avoid having
somebody pin and fill up the log, on a holiday, while I am barbecuing.
Better to have a little OS script triggered by cron every 20 minutes
that does a Q LOG, and if it's over 75% full do a SHOW LOGPIN and cancel
the offender.

Roger Deschner      University of Illinois at Chicago     rogerd AT uic DOT edu


On Tue, 7 Sep 2004, Stapleton, Mark wrote:

>From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On
>Behalf Of Roger Deschner
>>If I look at SQL variables, I see in the sessions table a
>>START_TIME timestamp, which is when that session started.
>>
>>However, if I do QUERY SESSION F=D I get a different number,
>>"Date/Time First Data Sent:", which in some cases is blank,
>>and in other cases is a later time than the SQL start_time.
>>How is this calculated? Obviously, this is how the throughput
>>calculations are performed as it decides who is moving too
>>slowly and should be cancelled. But where does this
>>information come from (i.e. which SQL variables) and how is
>>this calculation done?
>
>("Data/Time First Data Sent" will be blank for the control thread; it
>will have an entry for the session representing the data thread.)
>
>Remember that START_TIME in the session table marks the timetick at
>which the client establishes contact with the server. The "Date/Time
>First Data Sent" will come later, particularly much later if the client
>is a large one. This is because the TSM client does a scan of all files
>and directories prior to sending data to the TSM server.
>
>To minimize the difference between the two events, consider the use of
>the TSM Journaling Service (if the client is a windows client).
>Calculating slow throughput based on calculation will give skewed
>results on large TSM clients; I've seen large TSM clients that really
>fast-ball the data to the server--once the directory/file scan is done,
>which sometimes takes 30 minutes.
>
>--
>Mark Stapleton (stapleton AT berbee DOT com)
>Berbee Information Networks
>Office 262.521.5627
>
<Prev in Thread] Current Thread [Next in Thread>