Re: [ADSM-L] Re: TDPO restore of compressed file corrupted

When the restore had the failed objects, RMAN automatically restored the
necessary objects from the backup of one day earlier, along with all the
additional log backups it needed to bring the finished restore up to the same
point.  The DBAs restore this database from prod to test daily (along with
1-2 TB of other databases), usually without incident.

The activity log had no errors that appeared related to the restore.  It does
have TDPO messages sent back to the server that suggest the requested objects
were restored (ANU2536I, ANU2527I messages).

I believe most of this restore was actually satisfied from the disk pool,
with it hitting tape for only a handful of objects, most of which were likely
for the restore of the day-earlier objects.

Thanks for the suggestions.

=Dave


On 03/12/2015 12:04 PM, Francisco Javier wrote:
> Do you tried to restore files for another backup date or tried to find
> errors in actlog for the tape that is required for the restore?
>
>
>
> 2015-03-12 10:49 GMT-06:00 David Bronder <david-bronder AT uiowa DOT edu>:
>
>> I have a PMR open on this, but I wanted to see if any of you have seen
>> something like this...
>>
>> Setup:
>>
>> TSM 7.1.1.3 / TDP for Oracle 7.1.0.0 clients, Oracle 11.2.0.3
>> TSM 6.3.5.0 server
>> All systems are AIX 6.1 TL9 SP3
>>
>> Our DBAs were running a cross-node restore of the previous night's backup
>> of
>> a database on one Oracle server (prod) to the other Oracle server (test).
>> During the restore, a couple of the objects being restored failed with
>> errors
>> like this:
>>
>>> channel t2: ORA-19870: error while restoring backup piece n4q0s5ko_1_1
>>> ORA-19501: read error on file "n4q0s5ko_1_1", block number 2176513
>> (block size=512)
>>> ORA-27190: skgfrd: sbtread2 returned error
>>> ORA-19511: Error received from media manager layer, error text:
>>>    ANS1271E (RC176)  The compressed file is corrupted and cannot be
>> expanded correctly.
>>
>> The tdpoerror.log file additionally contained thousands of the following
>> messages (the numbers in each message varied) for each failed object:
>>
>>> ANS0361I DIAG: The 6131499099th code was found to be out of sequence.
>>> The code (307) was greater than (258), the next available slot in the
>> string table.
>>
>> The backups are indeed compressed.  But they are not corrupt in TSM; a
>> separate restore later successfully restored the objects that failed on the
>> first try.  Nothing in errpt to suggest storage or network errors.
>>
>> We've had a handful of these so far.  The only changes of note in the
>> environment that I can think of lately are the TSM API / TDPO client
>> updates
>> to 7.1 and the latest round of Oracle updates.  Since the compression is
>> happening at the TSM API level, I think I can rule out the Oracle CPU.
>>
>> The current word back from support is that a backup was occurring for the
>> source client at the same as the restore to the target client, which "is
>> completely against recommendation."  I have to say my initial reaction to
>> that statement was "you've got to be kidding me."  I don't recall ever
>> seeing
>> such a recommendation.  And these are Oracle databases, the DBAs are
>> running
>> log backups for them throughout the day every day...
>>
>> Has anyone else seen restore failures like this?  Am I wrong to expect TDPO
>> cross-node restores to work reliably while the source client is backing up
>> more data?
>>
>> Thanks for any feedback or insight.
>>
>> =Dave
>>

--
Hello World.                                David Bronder - Systems Architect
Segmentation Fault                                      ITS-EI, Univ. of Iowa
Core dumped, disk trashed, quota filled, soda warm.   david-bronder AT uiowa 
DOT edu