Re: Solution to RE: Error starting TSM server after upgrade

   It was me that expressed an interest in the answer. However if the
answer is to audit the DB, it's not going to work for me. I'm not sure
how long it will take to audit a 100GB DB, but we are a 24x7
manufacturing site, so downtimes on the TSM server typically have to be
kept to less than a couple hours or it starts to impact production (i.e.
Oracle DB archlog space fill up because it can't push to TSM and the
Oracle server screeches to a halt. Large archive areas are not cleaned
up by TSM and their processing stop, etc). 
 
    Ya, ya, putting the backup solution in a situation where it could
bring down production is a bad idea, but what can you do when you are
generating about 14TB of data a day to be backed up/archived? Nobody
wants to have to buy a new SAN/NAS/DAS every month to keep it all
on-line, so we make the TSM server clean up the data (archive & delete)
and keep it around.
 
    Just as a note, here is the command I use to see how much data has
flowed through the TSM server in the last 24 hours. I likely gleaned it
off this listsrv, so I ~believe~ it's correct.
select cast((cast(sum(summary.bytes) as float) / 1024 / 1024 /1024) as
decimal(10,2)) as Gigabytes from summary where start_time
>current_timestamp - 1 day
 
    Anyways, as I mentioned, that error is only seen on the startup of
the TSM server and doesn't seem to cause errors, but if I ever get a
window to do an auditdb,  I will keep this in mind.
 
Thanks,
Ben

________________________________

From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
Shannon Bach
Sent: Thursday, August 18, 2005 9:01 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Solution to RE: Error starting TSM server after upgrade



     Yesterday I posted to the list about an error message that was
generating when our TSM Server came back on-line after an upgrade to
5.2.2.  Those error messages were causing other error messages during
the Expiration process and seemed to be slowing down the Expiration
process (my hope is that getting rid of these errors will solve the
slow-down problem of the Expiration process...I won't be sure however
until I get rid of the messages :~) 
     We opened an ETR with IBM yesterday afternoon and there was already
a response when I came in this morning.  Someone expressed interest in
the solution if we found one so I will post an edited version of IBM's
response.  Because our TSM Server is on an MVS/ZOS mainframe, some
things are done differently than other platforms but the gist of it is
the same. 
      
The error messages are the result of a corrupt entry in the TSM Server
database.  The ANR9999D's callchain indicates that the TSM server's
migration thread is working to try and calculate space in the tapepool
to run a disk to tape migration.  In that process, the TSM server must
access the AF.Custers table. It is in this table that there is an
orphaned entry causing the error messages to be logged in the activity
log.                                     

To fix the problem, you will have to remove the orphaned entry.  The way
to do this is with audit of the TSM server's database.  This is an
off-line process, during which the TSM server is down.     

In short run an 

AUDITDB ARCHSTORAGE FIX=NO                 

Once the process is complete, restart the server normally.   

I'm scheduling time to do this today, I have a regularly scheduled
Expiration process done on Fridays... before each weekend.   I will post
the results on the list.... and if it really was affecting the
Expiration process time. 

As always, 
Thank You 
Shannon 
  





Madison Gas & Electric Co
Operations Analyst -Data Center Services
Information Management Systems
sbach AT mge DOT com