ADSM-L

[ADSM-L] TDPO restore of compressed file corrupted

2015-03-12 12:50:52
Subject: [ADSM-L] TDPO restore of compressed file corrupted
From: David Bronder <david-bronder AT UIOWA DOT EDU>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Thu, 12 Mar 2015 11:49:04 -0500
I have a PMR open on this, but I wanted to see if any of you have seen
something like this...

Setup:

TSM 7.1.1.3 / TDP for Oracle 7.1.0.0 clients, Oracle 11.2.0.3
TSM 6.3.5.0 server
All systems are AIX 6.1 TL9 SP3

Our DBAs were running a cross-node restore of the previous night's backup of
a database on one Oracle server (prod) to the other Oracle server (test).
During the restore, a couple of the objects being restored failed with errors
like this:

> channel t2: ORA-19870: error while restoring backup piece n4q0s5ko_1_1
> ORA-19501: read error on file "n4q0s5ko_1_1", block number 2176513 (block 
> size=512)
> ORA-27190: skgfrd: sbtread2 returned error
> ORA-19511: Error received from media manager layer, error text:
>    ANS1271E (RC176)  The compressed file is corrupted and cannot be expanded 
> correctly.

The tdpoerror.log file additionally contained thousands of the following
messages (the numbers in each message varied) for each failed object:

> ANS0361I DIAG: The 6131499099th code was found to be out of sequence.
> The code (307) was greater than (258), the next available slot in the string 
> table.

The backups are indeed compressed.  But they are not corrupt in TSM; a
separate restore later successfully restored the objects that failed on the
first try.  Nothing in errpt to suggest storage or network errors.

We've had a handful of these so far.  The only changes of note in the
environment that I can think of lately are the TSM API / TDPO client updates
to 7.1 and the latest round of Oracle updates.  Since the compression is
happening at the TSM API level, I think I can rule out the Oracle CPU.

The current word back from support is that a backup was occurring for the
source client at the same as the restore to the target client, which "is
completely against recommendation."  I have to say my initial reaction to
that statement was "you've got to be kidding me."  I don't recall ever seeing
such a recommendation.  And these are Oracle databases, the DBAs are running
log backups for them throughout the day every day...

Has anyone else seen restore failures like this?  Am I wrong to expect TDPO
cross-node restores to work reliably while the source client is backing up
more data?

Thanks for any feedback or insight.

=Dave

--
Hello World.                                David Bronder - Systems Architect
Segmentation Fault                                      ITS-EI, Univ. of Iowa
Core dumped, disk trashed, quota filled, soda warm.   david-bronder AT uiowa 
DOT edu

<Prev in Thread] Current Thread [Next in Thread>