ADSM-L

Re: Recovery log utilization does not drop after DB backup (+ pinned logtail)

2001-06-05 06:44:50
Subject: Re: Recovery log utilization does not drop after DB backup (+ pinned logtail)
From: Sheelagh Treweek <sheelagh.treweek AT COMPUTING-SERVICES.OXFORD.AC DOT UK>
Date: Tue, 5 Jun 2001 11:46:38 +0100
Paul,

Thankyou for the excellent write-up of your understanding of how
the recovery log works and what steps you take to avoid a pinned
logtail.

What I think you are saying is that for the log (in roll-forward)
to fill implies that the head has circled round to catch up the
tail because a transaction is still uncompleted?  Some of the
examples you cite are very clear and plausible : a large DB backup,
a backup over a slow dial-up line, perhaps with retries.  I'm less
clear about the inventory expiration impact - are you saying that
because it does a much higher level of activity/updates that it
is excluding (or slowing) another activity; or, are you saying that
when running a very long expire inventory that the process itself
can get into the situation of a pinned logtail?

We have not seen a pinned logtail when running inventory expiration;
it has always happened during the overnight, major intake of data.


>What do we do about this?
>

I know (from conversations we have had in the past) that this problem
has impacted your site much more than here and that the services you
provide are broadly on similar scale and scope to here at Oxford.

The principle differnces I observe is that we have split our services
onto separate TSM server instances as we have scaled up whereas I
believe Cornell have retained a single server? .

[I know you talked last year of splitting the service up.]

We also have data going to disk pools first so rarely have sessions
waiting for tape mount - and presumably a restore/retrieve/recall
does not take a transaction slot in the log (although there may be
a date update after the data has been given back) ?

Maybe what this has given us is less potential conflict of sessions/
activities/processes and maybe a little less vulnerability in this
particular aspect.

The lengths you detail that you have had to go to in order to avoid
this problem are quite extraordinary and unacceptably complex.

[As was having to split our service because of other scaling issues.]

I can see that a larger recovery log would alleviate the situation but
this maybe a problem that could potentially impact more sites in the
future as they move to server-free backups ... when the TSM server
becomes much more of a database engine?

It also seems to me that a client session that 'dies' but doesn't
disconnect from the server is a potential threat?  If there were
a server option (like) : CONSIDERDEADIFINACTIVEFOR {60mins} rather
than the throughput{data|time}threshold options (which  weren't
invented for this reason I believe and which I have had little
experience with) might alleviate a cause.  The same might be said
of a rogue process.  Although these situations are relatively rare,
they do happen.

>Hope this helps.  As I said before, the above explanation is my
>understanding of how TSM works, which may differ from how it actually
>works!   ;-)
>
>..Paul

Good enough for me - thanks for taking the time.

Regards, Sheelagh
--
Sheelagh Treweek
Sheelagh Treweek
Oxford University Computing Services
Email: sheelagh.treweek AT oucs.ox.ac DOT uk
Phone: +44 (0)1865 273205 Fax:-273275
+---------------------------------------------------------------------+
|  http://tsm-symposium.oucs.ox.ac.uk/   OXFORD 20/21 September 2001  |
|  REQUIREMENTS http://tsm-symposium.oucs.ox.ac.uk/requirements.html  |
+---------------------------------------------------------------------+