RMAN backups move no data

sandragon

ADSM.ORG Member
Joined
Aug 26, 2014
Messages
52
Reaction score
0
Points
0
OS: RHEL 6.9
TSM client: 8.1.0
TDP client: 8.1.0
TSM Server: 8.1.1

Recently built and configured Oracle instance does not complete backups. It will sometimes push data, but otherwise behaves very similarly to vilius.m's issue here: https://adsm.org/forum/index.php?threads/rman-archive-logs-stuck.31401/

Code:
tail -f level1_201707283707.log
Starting backup at 28-JUL-2017 11:37:12
using channel ORA_SBT_TAPE_1
channel ORA_SBT_TAPE_1: starting incremental level 1 datafile backup set
channel ORA_SBT_TAPE_1: specifying datafile(s) in backup set
input datafile file number=00048 name=/data01/DEV/PRODdtat_01.dbf
input datafile file number=00040 name=/data01/DEV/dd812t_02.dbf
input datafile file number=00026 name=/data01/DEV/histdtat_01.dbf
input datafile file number=00033 name=/data01/DEV/syapp1i_01.dbf
input datafile file number=00020 name=/data01/DEV/arcdta_01.dbf
channel ORA_SBT_TAPE_1: starting piece 1 at 28-JUL-2017 11:37:12

TDPO logs only show the following:
Code:
 tail dsmerror.log
07/27/2017 04:06:01 ANS4992W TDPO Linux86-64 ANU0599 TDP for Oracle: (12048): =>() ANU2604W The object /adsmorc/ /LVL1_2osabifc_1_1 was not found on the IBM Spectrum Protect Server
07/27/2017 04:06:07 ANS1909E The scheduled command failed.
07/27/2017 04:06:07 ANS1512E Scheduled event 'ORACLE_*******_INCR' failed.  Return code = 1.
07/28/2017 00:20:35 ANS0361I DIAG: sessSendVerb: Error sending Verb, rc: -50
07/28/2017 00:20:35 ANS1017E Session rejected: TCP/IP connection failure.
07/28/2017 00:20:35 ANS1017E Session rejected: TCP/IP connection failure.
07/28/2017 00:20:35 ANS0361I DIAG: sessSendVerb: Error sending Verb, rc: -50
07/28/2017 00:20:41 ANS4992W TDPO Linux86-64 ANU0599 TDP for Oracle: (12111): =>() ANU2604W The object /adsmorc/ /LVL1_4asae6ri_1_1 was not found on the IBM Spectrum Protect Server
07/28/2017 00:20:51 ANS1909E The scheduled command failed.
07/28/2017 00:20:51 ANS1512E Scheduled event 'ORACLE_*******_INCR' failed.  Return code = 1.

The TSM Server shows the following in the act log:
Code:
188,498 Tcp/Ip RecvW 50.817 M 1.268 K 2.001 M Node TDPO Linux86-64 *******_ORACLE
The wait time is at 50 minutes and it has pushed 2mb.

Failure sometimes occurs sometimes between 20 minutes and 4 hours.

Filesystem backups on this system work fine. Archive logs from oracle work fine. The full works fine, but runs very slow.

The DBAs tell me that the fulls and incrs both to a disk target occur very quickly, it's only when the TDP client is engaged that this behavior happens. Any thoughts? I'm going to get a PMR going Monday.
 
Hi,
Are you using container stgpool in this enviroment as target?

http://www-01.ibm.com/support/docview.wss?crawler=1&uid=swg1IT20858

If you do a tcpdump on either server or client, do you see tcpzerowindow issues when this error occur?
Each time I have tried to leave a TCP dump running when the backup job fails, TCPdump terminates before the failure. However, this is one of two identical systems. We set up a VM, and cloned it. One clone works, the other does not (they have different node names). Our TSM server has a vlan interface on this subnet so it's going over layer 2, so no routing is involved either.

We are not using containers in this environment, they are going to standard disk pools.

I've got a PMR open with IBM but they are fixating on the TCP/IP errors and networking, even though this is only affecting level 1 not level 0 backups. A networking error would occur across all backup types, in theory.
 
OK. Good luck. It would be interesting to know what root cause you find.

Could you post your dsm.sys/opt files used by rman? Maybe we can tune somethings there for you.
 
After a long delay, it appears that the issue is timeout. The DBs are Oracle Standard edition, so no block tracking, single threaded backup. On low change systems, this means very long waits while the DB is scanned for changes.
It never seems to take as long when it's being done to disk, only when using TDPO. Here's our opt file, sanitized of system identifiers:

Code:
************************************************************************
* Tivoli Storage Manager                                               *
************************************************************************

SErvername              TSM01
   COMMMethod           TCPip
   TCPPort              1500
   TCPServeraddress     [address]
   COMPRESSION          no
   Largecommbuffers     yes
   TCPB                 256
   TCPNODelay           yes
   TCPWindowsize        640
   TXNBytelimit         25600
   PASSWORDACCESS       generate
   RESOURCEUTILIZATION  5
   inclexcl             /opt/tivoli/tsm/client/ba/bin/inclexcl.list
   schedlogname         /opt/tivoli/tsm/client/ba/bin/dsmsched.log
   errorlogname         /opt/tivoli/tsm/client/ba/bin/dsmerror.log
   managedservices      schedule webclient
   schedmode            polling
   schedlogret          14 D
   nodename             [node]

SErvername              tdposched
   NODENAME             [node]_oracle
   COMMMethod           TCPip
   TCPServeraddress     [address]
   PASSWORDAccess       generate
   PASSWORDDIR          /opt/tivoli/tsm/client/oracle/bin64
   managedservices      schedule
   schedmode            prompted
   schedlogret          14 D
   schedlogname         /opt/tivoli/tsm/client/ba/bin/tdpo/dsmsched.log
   errorlogname         /opt/tivoli/tsm/client/ba/bin/tdpo/dsmerror.log
   inclexcl             /opt/tivoli/tsm/client/ba/bin/inclexcl.oracle
 
Back
Top