TDPOSync Question

GregE

ADSM.ORG Senior Member
Joined
May 12, 2006
Messages
2,089
Reaction score
31
Points
0
Website
Visit site
I have an Oracle database that is not backing up properly. I allocate two channels and in the TSM node definition I have set MAXNUMMP=2. Before our Oracle cluster failed over to the secondary node, this was working just fine.

Now, everytime the backup begins, it throws an error, then backs up on one drive but is not backing up the database fully I don't think. The TDP log almost immediately shows this error and several other times (5 times last night) throughout the backup...

RMAN-03009: failure of backup command on t1 channel at 07/13/2010 21:20:12
ORA-27192: skgfcls: sbtclose2 returned error - failed to close file
ORA-19511: Error received from media manager layer, error text:
ANS1301E (RC1) Server detected system error
continuing other job steps, job failed will not be re-run


These seem to also be the 5 in tdpoerror.log (with different backup files of course).....

07/13/10 22:43:11 ANS4994S TDP Oracle SUN ANU0599 TDP for Oracle: (19564): =>(DB1) ANU2602E The object /DB1_db//bk_66717_8_724284993 was not found on the TSM Server
07/13/10 22:59:44 ANS1301E Server detected system error
07/13/10 22:59:44 ANS1301E Server detected system error
07/13/10 22:59:44 ANS1301E Server detected system error
07/13/10 22:59:44 ANS1301E Server detected system error
07/13/10 22:59:44 ANS4994S TDP Oracle SUN ANU0599 TDP for Oracle: (19564): => (DB1) ANS1301E (RC1) Server detected system error


...and...

07/13/10 22:33:12 ANS4994S TDP Oracle SUN ANU0599 TDP for Oracle: (19544): =>(DB1) ANU2602E The object /orc8_E1PROD_db//bk_66716_11_724284739 was not found on the TSM Server
07/13/10 22:43:10 ANS1312E Server media mount not possible
07/13/10 22:43:10 ANS1312E Server media mount not possible
07/13/10 22:43:10 ANS1312E Server media mount not possible
07/13/10 22:43:10 ANS1312E Server media mount not possible
07/13/10 22:43:10 ANS4994S TDP Oracle SUN ANU0599 TDP for Oracle: (19564): => (DB1) ANS1312E (RC12) Server media mount not possible


Tape drives are available, so that's not an issue when it says "media mount not possible." I don't know what that error is actually telling me.

The Actlog shows this 5 times...

07/13/10 21:11:59 ANR0530W Transaction failed for session 665030 for node
DB1 (TDP Oracle SUN) - internal server error
detected. (SESSION: 665030)


When I run TDPOSYNC, I get a list of files. I've run TDPOSYNC several times on databases in the past, but have never had any output so I've never had any action to take. I have 253 files in the pick list, several files that are recent, within the retention period for DB1, and 95% of them much older, starting about 5 months ago and older. Is this a list of files that are safe to delete since RMAN knows nothing about them?

Also, another bit of info regarding the files mentioned (there are 5 of them, I just posted 2), such as:
DB1_db//bk_66717_8_724284993
None of those 5 files are listed in the TDPOSYNC pick list.
 
Last edited:
I found it, "user_dest_dump" parameter, in Oracle init.ora file for that DB, specifies it. Not much in it but does show things for the problem backups I've had for this DB.
Code:
Tracing started for:
-----------------------------------------------------------
   Application Client :   TDP Oracle SUN
              Version :   5.4.1.0
===========================================================
SBT-19544 07/13/2010 21:20:08 term2.cpp(394): sbtclose2(): Exit, dsmHandle = 1, rc = 1

===========================================================
Tracing started for:
-----------------------------------------------------------
   Application Client :   TDP Oracle SUN
              Version :   5.4.1.0
===========================================================
SBT-19564 07/13/2010 21:26:03 term2.cpp(394): sbtclose2(): Exit, dsmHandle = 1, rc = 12

SBT-19544 07/13/2010 22:33:10 term2.cpp(394): sbtclose2(): Exit, dsmHandle = 1, rc = 1

SBT-19564 07/13/2010 22:43:11 term2.cpp(394): sbtclose2(): Exit, dsmHandle = 1, rc = 12

SBT-19564 07/13/2010 22:59:45 term2.cpp(394): sbtclose2(): Exit, dsmHandle = 1, rc = 1
 
Can't say that rings a bell to me Greg but hopefully someone else will see it and respond. Worse comes to worse it'll mean something to Tivoli & Oracle if you have to open tickets with them.
 
Thanks for checking. I think I have two separate things going under one forum post. Oops.

Am I correct in my thinking on TDPOSYNC? Since that is listing what RMAN knows nothing about, is it safe to delete them?

I have another database (might have more, just checked this one other) where TDPOSYNC shows me files that are all older than 4 months ago and as old as 10 months ago. This database has a 14 day retention on it, so I'm curious about what can be done with the file list from TDPOSYNC.
 
When the Oracle cluster failed onto the secondary - did the Oracle control file come across cleanly?
 
Hi there

Please put the input of TDPOCONFG Showenv. Check if nothing is missing.

And don't use your commande like this, as it seems to be the first time, double check first your environement and be sure you retrieve the same information from TSM, rman/oracle for the 2 nodes of your cluster.

is your second instance a standby database or did you perform a full failover ?

There a lot of consideration to take in care that seems not to be (as per the post only).

Regards
 
Thank you. Our Solaris server clusters are setup such that when one faults, the other is now the primary BUT the Oracle databases are started manually by the Solaris admins.

tdpoconf showenv is exactly like other databases we have.
Code:
Data Protection for Oracle Information
 Version:              5
 Release:              4
 Level:                1
 Sublevel:             0
 Platform:             64bit TDP Oracle SUN

Tivoli Storage Manager Server Information
 Server Name:          DB1
 Server Address:       TSMSERVER1.COMPANY.COM
 Server Type:          Solaris SPARC 
 Server Port:          1502
 Communication Method: TCP/IP

Session Information
 Owner Name:           
 Node Name:            DB1
 Node Type:            TDP Oracle SUN
 DSMI_DIR:             /opt/tivoli/tsm/client/api/bin64
 DSMI_ORC_CONFIG:      /opt/tivoli/tsm/client/oracle/bin64/dsm_DB1.opt
 TDPO_OPTFILE:         tdpo_DB1.opt
 Password Directory:   /opt/tivoli/tsm/client/oracle/bin64
 Compression:          TRUE
 License Information:  License file exists and contains valid license data.

When the Oracle cluster failed onto the secondary - did the Oracle control file come across cleanly?

I'm not sure what you're asking. The Oracle db is on a SAN and the Solaris servers are Veritas clustered. When the primary server faults, Veritas sends Solaris to the secondary, but our Solaris admins start the databases manually. The databases are all working ok, so I assume the control file is ok for all of them, even this one that is a backup issue.
 
Hi Greg,

Was wondering if a different control file (or perhaps catalog) was active on the secondary node. RMan uses this as its source of truth (its what tdposync is supposed to sync) about the backup objects that exist. If you're getting odd results there's a reasonable chance that its snafu'd.

How about any config that isn't replicated between the hosts? TDP install versions, perhaps library links within Oracle product dirs?

Cheers,

Tony
 
No, there is only one place files are stored, that being on the SAN, regardless of which UNIX server is accessing it. So only one controlfile.

Still poking around, and have a PMR open with IBM.
 
Back
Top