I have a PMR open on this, but I wanted to see if any of you have seen
something like this...
Setup:
TSM 7.1.1.3 / TDP for Oracle 7.1.0.0 clients, Oracle 11.2.0.3
TSM 6.3.5.0 server
All systems are AIX 6.1 TL9 SP3
Our DBAs were running a cross-node restore of the previous night's backup of
a database on one Oracle server (prod) to the other Oracle server (test).
During the restore, a couple of the objects being restored failed with errors
like this:
> channel t2: ORA-19870: error while restoring backup piece n4q0s5ko_1_1
> ORA-19501: read error on file "n4q0s5ko_1_1", block number 2176513 (block
> size=512)
> ORA-27190: skgfrd: sbtread2 returned error
> ORA-19511: Error received from media manager layer, error text:
> ANS1271E (RC176) The compressed file is corrupted and cannot be expanded
> correctly.
The tdpoerror.log file additionally contained thousands of the following
messages (the numbers in each message varied) for each failed object:
> ANS0361I DIAG: The 6131499099th code was found to be out of sequence.
> The code (307) was greater than (258), the next available slot in the string
> table.
The backups are indeed compressed. But they are not corrupt in TSM; a
separate restore later successfully restored the objects that failed on the
first try. Nothing in errpt to suggest storage or network errors.
We've had a handful of these so far. The only changes of note in the
environment that I can think of lately are the TSM API / TDPO client updates
to 7.1 and the latest round of Oracle updates. Since the compression is
happening at the TSM API level, I think I can rule out the Oracle CPU.
The current word back from support is that a backup was occurring for the
source client at the same as the restore to the target client, which "is
completely against recommendation." I have to say my initial reaction to
that statement was "you've got to be kidding me." I don't recall ever seeing
such a recommendation. And these are Oracle databases, the DBAs are running
log backups for them throughout the day every day...
Has anyone else seen restore failures like this? Am I wrong to expect TDPO
cross-node restores to work reliably while the source client is backing up
more data?
Thanks for any feedback or insight.
=Dave
--
Hello World. David Bronder - Systems Architect
Segmentation Fault ITS-EI, Univ. of Iowa
Core dumped, disk trashed, quota filled, soda warm. david-bronder AT uiowa
DOT edu
|