ADSM-L

Re: export nodes causes TSM server crash

2006-02-28 07:25:29
Subject: Re: export nodes causes TSM server crash
From: Kurt Beyers <Kurt.Beyers AT DOLMEN DOT BE>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Tue, 28 Feb 2006 13:25:38 +0100
John,
 
Thanks for the script to check if any exports are running prior to starting the 
next batch job of 15 exports. It is currently running to check if it makes life 
easier for TSM. 
 
I managed to get another crash of the TSM server when the export of the first 
15 nodes was started. No other activities were being performed on the TSM 
server except the Tivoli Operational Reporting tool that monitors each hour the 
TSM activities. I've stopped the reporting service as well to see if this has 
anything to do with it.
 
No news from IBM support so far, I'll update the thread if I know more later on.
 
regards,
Kurt
 

________________________________

From: ADSM: Dist Stor Manager on behalf of John Monahan
Sent: Mon 2/27/2006 22:36
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: [ADSM-L] export nodes causes TSM server crash



I agree that the TSM server shouldn't ever crash, but just because it
shouldn't crash doesn't necessarily mean you should try to run 75 or 100
or 1000 exports concurrently either.  Until a fix is produced, I would
just limit your concurrent exports to what you know works without
committing a self-imposed denial of service attack on your TSM server.

Here is what I would do with your scripts that have the exports separated
into groups of 15 nodes each:
1.  Kick off the first one as is.
2.  Modify all the other scripts to first check for any export processes
still running, and if there are, then have those scripts reschedule
themselves.  ie:

select * from processes where upper(process)='EXPORT NODE'
if (rc_ok) goto reschedule
<run next set of export node commands here>
exit
:reschedule
del sched <thisschedname> type=a
def sched <thisschedname> type=a cmd="run <thisscriptname>" active=yes
startt=NOW+0:30 perunits=onetime
exit


______________________________
John Monahan
Consultant Infrastructure Solutions Group
Computech Resources, Inc.
Office: 952-833-0930 ext 109
Cell: 952-221-6938
http://www.computechresources.com




Kurt Beyers <Kurt.Beyers AT DOLMEN DOT BE>
Sent by: "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>
02/27/2006 02:57 PM
Please respond to
"ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>


To
ADSM-L AT VM.MARIST DOT EDU
cc

Subject
Re: export nodes causes TSM server crash






John,

The export of just 15 nodes was tested earlier on. It contained the larger
nodes already. At that time, the TSM server just was slowly (high CPU
consumption and a lot of disk I/O which is normal of course). It worked
fine.

The export of all of the nodes at the same time causes an immediate crash
of the TSM server. I did not mean to do the export at once but did not
notice that the parallel/serial commands would not work as the exports are
started in the background.


So I changed the script to work in groups of 15 nodes. The export of the
nodes in groups of 15 caused a new crash when the last group export was
started. A few of the earlier exports were still running at that time, the
nodes in the latest group export were rather small nodes.

A support call was logged of course.  The question is what causes the TSM
server crash. Except the PK_EXCEPTION and PK_THREAD messages in the
application log, nothing else is found.

Just have to wait for some new from the labs at this time. And will
contact them tomorrow again.

regards,
Kurt

________________________________

Van: ADSM: Dist Stor Manager namens John Monahan
Verzonden: ma 2/27/2006 20:15
Aan: ADSM-L AT VM.MARIST DOT EDU
Onderwerp: Re: [ADSM-L] export nodes causes TSM server crash



Let me see if I understand you correctly.  The export works fine when only
15 nodes are running, but after 2 hours when the second set of 15 nodes
kicks in (while some from the first group of 15 are stilli running)  that
is when your server crashes?  Or does your server crash with only 15 nodes
running an export?


______________________________
John Monahan
Consultant Infrastructure Solutions Group
Computech Resources, Inc.
Office: 952-833-0930 ext 109
Cell: 952-221-6938
http://www.computechresources.com




Kurt Beyers <Kurt.Beyers AT DOLMEN DOT BE>
Sent by: "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>
02/27/2006 05:35 AM
Please respond to
"ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>


To
ADSM-L AT VM.MARIST DOT EDU
cc

Subject
export nodes causes TSM server crash






Hello everybody,

I've got a TSM server 5.3.2.2 running on Windows2003 Enterprise Edition
SP1 (7 GB RAM, Xeon 3,2 GHz CPU) that has about 100 TSM clients defined.

Each month an export of each TSM node with the active backup data will be
taken to disk (DS4100 with SATA disks of 250 GB). The disk storage pool
that contains the backups is on the DS4100 too.

I've scheduled the export of the TSM nodes past weekend with a few
scripts.

I first tried to launch just one script that took the export in blocks of
15 nodes using the PARALLEL and SERIAL commands. However as the export is
started in the background, all of the 75 exporst were started immediately.
This causes a TSM server crash. After restarting the TSM server, no error
logs are found in the activity log. Except that no more than 16 commands
can be started in one PARALLEL statement. The last normal message about
the export is written in the log and then the next message are when the
server is started again.

I've split up then the export myself in a script where the export of 15
nodes was started and 4 administrative schedules were defined that
triggered the export of 15 additional nodes every 2 hours later on. The
TSM server crashed once more.

Is this a know feature when the export of a lot of nodes is started? Am I
overseeing some parameters here? Can the export be started in a better way
using TSM scripting?

An export server instead of an 'export node' for each TSM node is not an
option as then the impot of one node would take too much time.

thanks in advance,

Kurt

<Prev in Thread] Current Thread [Next in Thread>