ADSM-L

Re: Our log fills up and server crashes

2002-04-03 13:37:49
Subject: Re: Our log fills up and server crashes
From: Tab Trepagnier <Tab.Trepagnier AT LAITRAM DOT COM>
Date: Wed, 3 Apr 2002 12:38:28 -0600
Richard,

Several years ago we had the same issue.  The fact that your (large) log
fills up tells me that unless you have a *HUGE* TSM system, you might be
having the same issue we had.

In our case, we had an Irix server that served as the hub of a network of
Irix CAD workstations.  Each workstation had a link from its X desktop to
the X desktop of the server.  The server had a link on its X desktop back
to the X desktop of every workstation.

When we launched archives on any workstation or on the CAD server, it would
iteratively follow those links back and forth creating ADSM entries like:
/file
/link1/file
/link2/link1/file
/link1/link2/link1/file
/link2/link1/link2/link1/file
and so forth.
The client did that on every file on every client until the maximum path
length was reached and then it started with the next file!

Because these were fast machines on 100 Mbs links, the phenomenon would
generate hundreds of millions of "files" on every archive and crash the
server before it had time to load a tape and save a DB copy.  At the time
we were running with a log size of just under 5 GB.

With the 3.1.0.7 client, the ARCHSYMLinkasfile option was introduced, and
that fixed the problem, but by then our DB was garbaged up, and still is to
some extent.

Since then we added 50% more clients, and shrunk the log to 4.4 GB, but
because that behavior has not recurred, we haven't had a "log crash" in
over two years.

Hope this helps.

Tab Trepagnier
TSM Administrator
Laitram Corporation






Richard Foster <Richard.Foster AT HYDRO DOT COM>@VM.MARIST.EDU> on 04/03/2002
09:27:19 AM

Please respond to "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>

Sent by:  "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>


To:   ADSM-L AT VM.MARIST DOT EDU
cc:
Subject:  Our log fills up and server crashes


Anyone seen this before?

Our TSM server has crashed 3 nights running with a full log. We started
with Logmode=Rollforward and 3960 MB log. We tried increasing to 6012 MB,
which delayed the crash by about 45 mins.

We have DB backup trigger at 60%. But even when the backup ran, we got msg
"ANR4556W Attention: the database backup operation did not free sufficient
recovery log space to lower utilization below the database backup trigger.
The recovery log size may need to be increased." I guess this indicates a
transaction which is out of control - but I can't identify which TXN it is.

We have now switched to Logmode=normal, but I'm not sure it'll help.

Setup: TSM server 4.2.1.11 running on AIX 4.3.3 plus ML9 and patches.
Clients on NetWare, Windows, Unix (various flavours), plus TDP for R/3.

Any comments, suggestions, etc received gratefully.

Richard Foster
<Prev in Thread] Current Thread [Next in Thread>