Oracle RMAN Failed but TSM Scheduler completed successfully,return code 0.

jackyzhang

ADSM.ORG Member
Joined
Jun 13, 2005
Messages
47
Reaction score
1
Points
0
Location
BeiJing, China
Website
Visit site
********************************
RMAN failed:
********************************
# more 2012-05-20.log
[YOU HAVE NEW MAIL]

Recovery Manager: Release 10.2.0.4.0 - Production on Sun May 20 02:55:50 2012

Copyright (c) 1982, 2007, Oracle. All rights reserved.

connected to target database: BILLREP (DBID=3534414606)

RMAN> run{
2> allocate channel t1 type 'sbt_tape';
3> allocate channel t2 type 'sbt_tape';
4> send 'ENV=(TDPO_OPTFILE=/usr/tivoli/tsm/client/oracle/bin64/tdpo.opt)';
5>
6> sql 'alter system switch logfile';
7> backup full filesperset 4
8> format 'oracle_%p_%u_%s_%t'
9> (database include current controlfile);
10> release channel t1;
11> release channel t2;
12> }
13> allocate channel FOR MAINTENANCE device type 'sbt_tape' parms 'ENV=(tdpo_optfile=/usr/tivoli/tsm/client/oracle/bin64/tdpo.opt)';
14> delete noprompt obsolete recovery window of 15 days;
15> delete noprompt expired backup;
16> release channel;
17>
using target database control file instead of recovery catalog
allocated channel: t1
channel t1: sid=505 instance=billrep2 devtype=SBT_TAPE
channel t1: Data Protection for Oracle: version 5.4.1.0

allocated channel: t2
channel t2: sid=502 instance=billrep2 devtype=SBT_TAPE
channel t2: Data Protection for Oracle: version 5.4.1.0

sent command to channel: t1
sent command to channel: t2

sql statement: alter system switch logfile

Starting backup at 2012-05-20 02:55:56
channel t1: starting full datafile backupset
channel t1: specifying datafile(s) in backupset
input datafile fno=00008 name=+BILL_DG/billrep/datafile/data3.504.674150911
input datafile fno=00010 name=+BILL_DG/billrep/datafile/users.508.674309879
channel t1: starting piece 1 at 2012-05-20 02:55:57
channel t2: starting full datafile backupset
channel t2: specifying datafile(s) in backupset
input datafile fno=00006 name=+BILL_DG/billrep/datafile/data1.272.674781043
input datafile fno=00009 name=+BILL_DG/billrep/datafile/system.501.674161585
input datafile fno=00011 name=+BILL_DG/billrep/datafile/sysaux.511.674310147
input datafile fno=00004 name=+BILL_DG/billrep/datafile/users.502.674309871
channel t2: starting piece 1 at 2012-05-20 02:55:57
RMAN-03009: failure of backup command on t2 channel at 05/20/2012 02:59:22
ORA-19502: write error on file "oracle_1_38nbdvtt_10344_783744957", blockno 2734593 (blocksize=512)
ORA-27030: skgfwrt: sbtwrite2 returned error
ORA-19511: Error received from media manager layer, error text:
ANS1315W (RC15) Unexpected retry request. The server found an error while writing the data.
channel t2 disabled, job failed on it will be run on another channel
released channel: t1
released channel: t2
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on t1 channel at 05/20/2012 03:02:07
ORA-19502: write error on file "oracle_1_37nbdvtt_10343_783744957", blockno 2423809 (blocksize=512)
ORA-27030: skgfwrt: sbtwrite2 returned error
ORA-19511: Error received from media manager layer, error text:
ANS1315W (RC15) Unexpected retry request. The server found an error while writing the data.

Recovery Manager complete.
#
****************************************
TSM Scheduler completed successfully,return code 0.
****************************************
05/20/12 02:55:46 --- SCHEDULEREC QUERY BEGIN
05/20/12 02:55:50 --- SCHEDULEREC QUERY END
05/20/12 02:55:50 Next operation scheduled:
05/20/12 02:55:50 ------------------------------------------------------------
05/20/12 02:55:50 Schedule Name: P561B_SCHED_INCR0
05/20/12 02:55:50 Action: Command
05/20/12 02:55:50 Objects: /usr/tivoli/ora_script/start-full.sh
05/20/12 02:55:50 Options:
05/20/12 02:55:50 Server Window Start: 02:00:00 on 05/20/12
05/20/12 02:55:50 ------------------------------------------------------------
05/20/12 02:55:50
Executing scheduled command now.
05/20/12 02:55:50
Executing Operating System command or script:
/usr/tivoli/ora_script/start-full.sh
05/20/12 03:02:12 Finished command. Return code is: 0
05/20/12 03:02:12 ANS1908I The scheduled command completed successfully.
05/20/12 03:02:12 Scheduled event 'P561B_SCHED_INCR0' completed successfully.
05/20/12 03:02:12 Sending results for scheduled event 'P561B_SCHED_INCR0'.
05/20/12 03:02:12 Results sent to server for scheduled event 'P561B_SCHED_INCR0'.

05/20/12 03:02:12 ANS1483I Schedule log pruning started.
05/20/12 03:02:12 ANS1484I Schedule log pruning finished successfully.
05/20/12 03:02:12 TSM Backup-Archive Client Version 5, Release 5, Level 0.0
05/20/12 03:02:12 Querying server for next scheduled event.
05/20/12 03:02:12 Node Name: P561B_ORACLE
05/20/12 03:02:17 Session established with server BJGAS-TSMSERVER: Windows
05/20/12 03:02:17 Server Version 5, Release 5, Level 2.0

05/20/12 03:02:17 Server date/time: 05/20/12 02:06:40



So how can we fix it ? Is it a tsm bug or shell script bug?

Thanks.

Jacky
Storage Administrator
cloudjacky'at'gmail.com
 
# more bak_full.sh
run{
allocate channel t1 type 'sbt_tape';
allocate channel t2 type 'sbt_tape';
send 'ENV=(TDPO_OPTFILE=/usr/tivoli/tsm/client/oracle/bin64/tdpo.opt)';

sql 'alter system switch logfile';
backup full filesperset 4
format 'oracle_%p_%u_%s_%t'
(database include current controlfile);
release channel t1;
release channel t2;
}
allocate channel FOR MAINTENANCE device type 'sbt_tape' parms 'ENV=(tdpo_optfile=/usr/tivoli/tsm/client/oracle/bin64/tdpo.opt)';
delete noprompt obsolete recovery window of 15 days;
delete noprompt expired backup;
release channel;
#




# more bak_incr1.sh
#connect target sys/manager
run{
allocate channel t1 type 'sbt_tape';
#set limit channel t1 kbytes 510;

allocate channel t2 type 'sbt_tape';
#set limit channel t1 kbytes 510;

send 'ENV=(TDPO_OPTFILE=/usr/tivoli/tsm/client/oracle/bin64/tdpo.opt)';

#sql 'alter system switch logfile';
# backup full filesperset 4
# format 'archivelog_%p_%u_%s_%t'
# (archivelog all delete input);


# backup full filesperset 4
# format 'oracle_%p_%u_%s_%t'
# (database include current controlfile);


#backup incremental level 1
#format 'billrep_incr_%T_%t_%U' database
#include current controlfile;
sql 'alter system archive log current';
backup filesperset 4 (archivelog all skip inaccessible delete input);

release channel t1;
release channel t2;
 
#more start-full.sh

LOGFILE=/usr/tivoli/ora_log/dbbak.log
LOGDATE=`date +%Y-%m-%d`
RMANLOG=/usr/tivoli/ora_log/${LOGDATE}.log

echo "-------------------------Level 0-------------------------" >> ${LOGFILE}
echo "Begin Time:" `date +%Y-%m-%d_%H:%M:%S` >> ${LOGFILE}


CMDFILE=/usr/tivoli/ora_script/bak_full.sh

su - oracle -c "rman target / cmdfile $CMDFILE" <<EOF > ${RMANLOG}

exit;
EOF
echo "End Time:" `date +%Y-%m-%d_%H:%M:%S` >> ${LOGFILE}
echo "------------------------------------------------------------------" >> ${LOGFILE}
 
The problem is with your shell script. The TSM scheduler calls the script to run the RMAN backup and since the scripts runs, TSM thinks everything is OK. Thus it thinks that the script gave a RC=0 which, as far as TSM is concerned, the backup was successful.

Modify your script such that the RC=0 is returned only after the RMAN successfully completes.
 
Back
Top