ADSM-L

Re: [ADSM-L] TSM 6.1 and the ever expanding DB

2009-10-06 20:14:49
Subject: Re: [ADSM-L] TSM 6.1 and the ever expanding DB
From: Colin Dawson <colind AT US.IBM DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Tue, 6 Oct 2009 17:12:49 -0700
Howdy,

    Of the items being discussed I do not know if PMR's are open with IBM
service or not.  I encourage folks to open PMR's for issues that are
encountered.  We are working through reported problems and getting them
diagnosed and resolved.

     There is a fix for IC62978 that will be coming out with the server
patch level 6.1.2.1.  The reason I mention this here is that in some cases
this may be contributing to needing more log space then was originally
expected or planned for.  The issue is that the online table reorganization
is opportunitic meaning that it really gets going and doing its work when
the server is idle or not running a heavy workload.  The problem arises
when there are long running transaction(s) such as a TDP or BA client
backing up a large object (100's of GB or TB in size) using a single
transaction.  This long running single transaction mixed with table
reorganization doing it's thing can result in a lot of log use for the
active and archive log and depending upon the duration of things and the
log settings in place we may get to an out of log space situation and bring
down the server.  The fix for this APAR causes the table reorganization to
work and play better with other transactions and should provide relief for
some of the ever increasing log size issues being encountered.  Certainly
if after 6.1.2.1 is available there continues to be log consumption issues,
that is definitely something we'd want to see and pursue via problem
records.

    The other fix of note coming out in 6.1.2.1 is IC63162.  This is a
timing related crash that has been reported and seen by a number of
customers.  So, if you are experiencing an intermittent crash this is
definitely something that should be reported especially if it continues to
be seen after 6.1.2.1 is available and applied in your environment.

      Relating to both APAR's mentioned above, additional information will
be published for this via our flash process in the next few days...

     With regards to database space and needing significantly more then was
needed for V5.x, I have not seen or diagnosed much in that area at this
point.  From the discussion here it seems that we are seeing this on a
couple different fronts.  One being from Geoff Gil (references having a
problem record opened for this) and the other being from Stefan Folkerts
(no reference to a problem record).  If folks have problem records for
these, please email the PMR # and we can get the development/L3 team
engaged as we would like to look into this and see what is going on.  From
a high-level view of things, we are using table level compression which
should keep the database space in check.  The preliminary investigation in
this case will then be to see what kind of efficiency are we seeing with
that or is it being given a chance to actually do what is needed.  The
other thing that comes to mind in this case is that TSM V6 is using a much
larger page size for the database.  Most of the tables are using 8KB for
the database pages while our largest tables are using 32KB.  Part of the
initial "bubble" of increased database space may be resulting from the way
the database clustering is being done along with these larger allocation
units/pages being in the mix.  So, if this were coming in to play in this
case, it may be that the occupancy per page is relatively low and future
record insertions and such will use this space and things will plateau or
stabilize as such...  Anyway, this is just speculation and there are
certainly things we'll need to look at to get a handle on this and such...

    Thanks in advance for taking the time to read through my response and
thoughts on things...
-----------------------------------------------------
Colin Dawson
TSM Server Development
colind AT us.ibm DOT com



  From:       Stefan Folkerts <stefan.folkerts AT ITAA DOT NL>

  To:         ADSM-L AT VM.MARIST DOT EDU

  Date:       10/04/2009 11:26 PM

  Subject:    Re: [ADSM-L] TSM 6.1 and the ever expanding DB

  Sent by:    "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>






I have been a member of the "TSM 6 in production club" since 6.1.2.0 and
I am not happy with this release.

I am also seeing strange db errors right from the start (clean install
and export/import from 5.5 server) looking at the IBM help for these
errors I read fairly cryptic steps about changing DB2 setting..step one
was "Open the DB2 console"..what the heck!?
I don't want to open the DB2 console, IBM told me I would not need DB2
knowledge and that the DB2 database use would be transparent for
me..well..it's not...within the first hours after install until now it
is not.

DB size has increased about 5 times (using dedupe) and don't even get me
started on the log size.

Good things are dedupe, I get a 22% reduction of the amount of data
stored on disk which is nice.
Performance is good, it sucks that they changed some internal tables so
tools such as TSMmanager need to catch up on the new layout..TSMmanager
reports on size stuff don't work at the moment.

More than once the system just stops responding...no lead on where it
comes from and I never had this before on v5, it's not the hardware
since I run TSM v6 inside a VMware VM and all other vm's are fine.

At this point in time I would not recommend using TSM 6.1.2.0 in a
production environment.

-----Oorspronkelijk bericht-----
Van: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] Namens Zoltan
Forray/AC/VCU
Verzonden: vrijdag 2 oktober 2009 15:15
Aan: ADSM-L AT VM.MARIST DOT EDU
Onderwerp: Re: [ADSM-L] TSM 6.1 and the ever expanding DB

Join the club.  I am beginning to wonder if anyone is successfully using
V6.1, trouble-free.

Monday I decided to put my 6.1.2 server into production and am wondering
if this was a really bad decision.

I have had to bounce it 5-times due to it simply hanging/going
non-responsive eventhough the only "activity" has been exporting a large
node from another server.

The primary active log has been expanded 3-times (from 20GB to 60GB)
eventhough I run 3-full DB backups daily.

I had to reserve 300GB for the archivelog space.

The DB has grown to 65GB for 4-nodes eventhough the original server with
250-nodes is only 80GB used.

The diagnostic information for DB/log errors is fairly useless.  The
book
says to go to DB2 to get it to explain the SQL????? errors, eventhough
in
other places the book says to not mess with DB2 ("pay no attention to
the
man behind the curtain......").  I am having to become way more
knowledgeable in DB2 than I ever wanted to be  ("Damn it, Jim.....I am
the
backup/TSM administrator - not a DBA!" - apologies to DeForest Kelley)

Just got my 5th SQL error this week ("10/2/2009 8:49:46 AM ANR0162W
Supplemental database diagnostic information:  -1:22003:-413 ([IBM][CLI
Driver][DB2/LINUXX8664] SQL0413N  Overflow occurred during numeric data
type conversion.  SQLSTATE=22003")

I have to run 3-full DB backups every day (along with the now added
3-BACKUP VOLHIST)  just to try to keep ahead of what I consider normal,
daily activity (never had to do this on V5.x - daily DB incrementals
use
to be more than enough - heaven help me if I get this server up to the
size of my biggest V5 server which has a 150GB DB - I could never backup
the DB fast enough to keep it from crashing).

---------------

How about an informal poll.

How many folks are running V6.1.2 servers in production?

How big (occupancy?  DB size?  Number of active nodes?)

What platform?



From:
"Gill, Geoffrey L." <GEOFFREY.L.GILL AT SAIC DOT COM>
To:
ADSM-L AT VM.MARIST DOT EDU
Date:
10/01/2009 08:12 PM
Subject:
[ADSM-L] TSM 6.1 and the ever expanding DB
Sent by:
"ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>



I'm finding that what I know about how the DB works in 5.5 doesn't
really equal how it works in 6.1. On a Linux box I brought up to migrate
clients to a 6.1 server I created a 20GB log and 100GB DB. There 'will
be' about 150 nodes moved to this instance but currently about 20 are
backing up. My 5.5 server, on AIX 5.3, has a 125GB DB about 50% used, a
11GB log and it backs up 500+ clients per day with no issues.



Last nights backup on the new box is telling me there is no more space
in the database so backups are failing. After backing up systems for 30
days? I find that way out of whack from how 5.5 works and it seems to be
telling me I need more than 10 times the space to keep 6.1 up. I can't
believe 20 computers have eaten up 100GB of DB space in such a short
period of time.



I have a case open with IBM to discuss but I'm wondering what others are
finding that are using 6.1. Perhaps I'm missing something in my setup
that is causing the problem (I hope) because if not I don't want to even
think about how much disk I have to add to the current box so I can
upgrade it and make it run with the 400+ systems that will stay on it.



Anyone else seeing this or have an idea what I may have missed?



Geoff Gill
TSM/PeopleSoft Administrator

SAIC M/S-B1P

4224 Campus Pt. Ct.

San Diego, CA  92121
(858)826-4062 (office)

(858)412-9883 (blackberry)