RMAN: Restore Validate: SOMETIMES with error

adsmsuser · Apr 8, 2011

Hi!

We have some Oracle DBs here that SOMETIMES have problems with Backup ( in this Case: Restore Validate )

We have IDLETIMEOUT on the Server @ 240 minutes and it seems that TSM killes the Session after that duration!??? No matter if its still needed
Here some Logs:

TSM-SERVER-LOG:
07.04.11 21:44:32 MESZ ANR0406I Session 164254 started for node ORACLE-TDP (TDPO Linux86-64) (TCP/IP 10.6.227.155(63289)). (SESSION: 164254)
07.04.11 21:44:32 MESZ ANR1639I Attributes changed for node ORACLE-TDP: TCP Name from ebrefdb2 to ebprddb1, TCP Address from 10.6.227.165 to 10.6.227.155, GUID from 55.1b.1f.3a.ef.ca.11.de.ad.e5.00.15.17.c8.c7.40 to 00.28.98.58.ea.f8.11.de.88.87.00.15.17.c8.c6.26. (SESSION: 164254)
07.04.11 21:44:32 MESZ ANR0408I Session 164255 started for server TSMLM1 (HP-UX) (TCP/IP) for library sharing. (SESSION: 164254)
07.04.11 21:44:32 MESZ ANR0409I Session 164255 ended for server TSMLM1 (HP-UX). (SESSION: 164254)
07.04.11 21:44:32 MESZ ANR0408I Session 164256 started for server TSMLM2 (HP-UX) (TCP/IP) for library sharing. (SESSION: 164254)
07.04.11 21:44:32 MESZ ANR0409I Session 164256 ended for server TSMLM2 (HP-UX). (SESSION: 164254)
07.04.11 21:44:32 MESZ ANR0408I Session 164257 started for server TSMLM2 (HP-UX) (TCP/IP) for library sharing. (SESSION: 164254)
07.04.11 21:44:32 MESZ ANR0409I Session 164257 ended for server TSMLM2 (HP-UX). (SESSION: 164254)
07.04.11 21:44:32 MESZ ANR0408I Session 164258 started for server TSMLM2 (HP-UX) (TCP/IP) for library sharing. (SESSION: 164254)
07.04.11 21:44:32 MESZ ANR0409I Session 164258 ended for server TSMLM2 (HP-UX). (SESSION: 164254)
07.04.11 21:44:32 MESZ ANR0408I Session 164259 started for server TSMLM2 (HP-UX) (TCP/IP) for library sharing. (SESSION: 164254)
07.04.11 21:44:32 MESZ ANR0409I Session 164259 ended for server TSMLM2 (HP-UX). (SESSION: 164254)
08.04.11 01:45:16 MESZ ANR0482W Session 164254 for node ORACLE-TDP (TDPO Linux86-64) terminated - idle for more than 240 minutes. (SESSION: 164254)
^ HERE IT GETS TERMINATED:

RMAN-LOG:
[FONT=&quot]channel ORA_SBT_TAPE_1: reading from backup piece rman_full_20110402_201510_1_9444_6.bak[/FONT]
[FONT=&quot]channel ORA_SBT_TAPE_1: piece handle=rman_full_20110402_201510_1_9444_6.bak tag=FULL LEVEL 0 BACKUP[/FONT]
[FONT=&quot]channel ORA_SBT_TAPE_1: restored backup piece 6[/FONT]
[FONT=&quot]channel ORA_SBT_TAPE_1: reading from backup piece rman_full_20110402_201510_1_9444_7.bak[/FONT]
[FONT=&quot]channel ORA_SBT_TAPE_1: ORA-19870: error while restoring backup piece rman_full_20110402_201510_1_9444_7.bak[/FONT]
[FONT=&quot]ORA-19507: failed to retrieve sequential file, handle="rman_full_20110402_201510_1_9444_7.bak", parms=""[/FONT]
[FONT=&quot]ORA-27029: skgfrtrv: sbtrestore returned error[/FONT]
[FONT=&quot]ORA-19511: Error received from media manager layer, error text:[/FONT]
[FONT=&quot] ANS1235E (RC-72) An unknown system error has occurred from which TSM cannot recover.[/FONT][FONT=&quot][/FONT]

TDP/O-Tracefile:
[FONT=&quot]2011-04-08 01:45:18.544 [000566] [1303844672] : session2.cpp ( 905): tdpoPrepGet(): dsmHandle = 1, 'ANS1235E (RC-72) An unknown system error has occurred from which TSM cannot recover.'[/FONT]
[FONT=&quot] [/FONT]
[FONT=&quot]2011-04-08 01:45:18.544 [000566] [1303844672] : session2.cpp ( 911): tdpoPrepGet(): Exit - DSMBEGINGETDATA() failed. dsmHandle = 1, rc = -72[/FONT][FONT=&quot][/FONT]
[FONT=&quot] [/FONT]
I am not shure where the problem is located.
actually TSM should not kill the session, but on the otherhand:
rman should do something to keep it alive!????

has anyone expierienced such a problem .. and has a conclusion?
TSM says: increase IDLETIMEOUT.
but i think 4 Hours Idletimeout is far enough!!!

Thanks

Harry_Redl · Apr 8, 2011

Hi,

this happens in LANFREE environments when the transfer time of the RMAN backup piece is longer than the IDLETIMEOUT. RMAN sends (retrieves) the data to (from) the storage agent and contacts the TSM server only after the piece is finished.
You can enlarge the IDLETIMEOUT (and COMMTIMEOUT) parameters OR you can limit the size of the backuppiece on the RMAN side (we are now at 8 or 16GB and have no problem).
This approach has another advantage - if anything goes wrong during restore, RMAN knows which pieces were already restored and does not try them again ....

Harry

adsmsuser · Apr 8, 2011

hm. well. i guess we found the error or at least the cause why we ran into timeouts WITH lanfree.

lets say:
some db admins made an rman duplicate while
some db admins made a backup.

as the test-environment hast quite slow storage this duplicate took long enough to let the restore validate run into the 240min timeout.

^^

RMAN: Restore Validate: SOMETIMES with error

adsmsuser

Harry_Redl

Moderator

adsmsuser

Data Privacy Impact Assessment

Sponsor ADSM.ORG

Navigation Menu

NordVPN 3 Months FREE

Forum statistics