TDP Restore Problem

ChrisJ

Active Newcomer
Joined
Feb 22, 2008
Messages
10
Reaction score
0
Points
0
TSM/TDP guys help please, I haven't had any luck finding a solution to our problem. Can't restore SQL databases using TDP-the container gets built but once anywhere from 1 GB to 18 GB of data is populated it fails. We can restore using SQL exclusively and even restore databases to a DRP stand alone server via TDP and the same data set. Backups on the server work like a charm on the 2 problem servers. We have set the TSM Comm timeout setting to 8 hours. I have looked up rc=419 and it point to includes/excludes which seem to be defined correctly. Not sure what other info
Code:
SQL 2000 (Build 2187) and OS is win 2003
TSM Server Version 5, Release 3, Level 3.0
TDP client Version 5 Release 2 Level 1.04
2 SQL servers are clustered
 
Errors:
 
TDP Errors:
02/21/2008 09:38:16 ACO5436E A failure occurred on stripe number (0), rc = 428
02/21/2008 09:38:16 ACO5407E The SQL server aborted the operation.
02/21/2008 09:38:21 Restore of EBA failed.
02/21/2008 09:38:21 ACO5407E The SQL server aborted the operation.
02/21/2008 11:42:08 ACO5436E A failure occurred on stripe number (0), rc = 419
02/21/2008 11:42:08 ACO0003S An internal processing error has occurred.
02/21/2008 11:42:09 Restore of EBA failed.
02/21/2008 11:42:09 ACO0003S An internal processing error has occurred.
 
 
  
SQL Log Errors
2008-02-21 09:38:16.54 spid240   BackupMedium::ReportIoError: read failure on backup device 'TDPSQL-000012A4-0000'. Operating system error 995(The I/O operation has been aborted because of either a thread exit or an application request.).
2008-02-21 09:38:16.54 spid240   Internal I/O request 0x706BD800: Op: Read, pBuffer: 0x1AD40000, Size: 1048576, Position: 4865398272, UMS: Internal: 0x706BD798, InternalHigh: 0x0, Offset: 0x340480, OffsetHigh: 0x0, m_buf: 0x1AD40000, m_len: 0, m_actualBytes: 0, m_errcode: 995, BackupFile: TDPSQL-000012A4-0000
2008-02-21 11:38:45.16 spid981   Unnamed tape (Family ID: 0x8b47c87e, sequence 1) mounted on tape drive 'TDPSQL-000010F8-0000'.
2008-02-21 11:42:08.68 spid981   BackupMedium::ReportIoError: read failure on backup device 'TDPSQL-000010F8-0000'. Operating system error 1003(Cannot complete this function.).

Event log errors
Event Type: Warning
Event Source: MSSQLSERVER
Event Category: (2)
Event ID: 17055
Date:  2008/02/21
Time:  9:34:29 AM
User:  
Computer: 
Description:
18227 :
Unnamed tape (Family ID: 0x8b47c87e, sequence 1) mounted on tape drive 'TDPSQL-000012A4-0000'.

    
Event Type: Error
Event Source: MSSQLSERVER
Event Category: (2)
Event ID: 17055
Date:  2008/02/21
Time:  9:38:16 AM
User:  
Computer: 
Description:
18210 :
BackupMedium::ReportIoError: read failure on backup device 'TDPSQL-000012A4-0000'. Operating system error 995(The I/O operation has been aborted because of either a thread exit or an application request.).

    
Event Type: Warning
Event Source: MSSQLSERVER
Event Category: (2)
Event ID: 17055
Date:  2008/02/21
Time:  11:38:45 AM
User:  
Computer: 
Description:
18227 :
Unnamed tape (Family ID: 0x8b47c87e, sequence 1) mounted on tape drive 'TDPSQL-000010F8-0000'.

     
Event Type: Error
Event Source: MSSQLSERVER
Event Category: (2)
Event ID: 17055
Date:  2008/02/21
Time:  11:42:08 AM
User:  
Computer: 
Description:
18210 :
BackupMedium::ReportIoError: read failure on backup device 'TDPSQL-000010F8-0000'. Operating system error 1003(Cannot complete this function.).


VDI Errors
----------------------------------------------
2008/02/21 09:38:16 pid(5216) tid(5740)
Error on Global\TDPSQL-000012A4-0000
Error at TriggerAbort: invoked
----------------------------------------------
2008/02/21 09:38:16 pid(5216) tid(5740)
Error on Global\TDPSQL-000012A4-0000
Error at TriggerAbort: invoked
----------------------------------------------
2008/02/21 09:38:16 pid(5216) tid(5740)
Error on Global\TDPSQL-000012A4-0000
Error at TriggerAbort: invoked
----------------------------------------------
2008/02/21 09:38:16 pid(5216) tid(5740)
Error on Global\TDPSQL-000012A4-0000
Error at TriggerAbort: invoked
----------------------------------------------
2008/02/21 09:38:16 pid(5216) tid(5740)
Error on Global\TDPSQL-000012A4-0000
Error at TriggerAbort: invoked
----------------------------------------------
2008/02/21 09:38:16 pid(5216) tid(5740)
Error on Global\TDPSQL-000012A4-0000
Error at TriggerAbort: invoked
----------------------------------------------
2008/02/21 09:38:16 pid(5216) tid(5740)
Error on Global\TDPSQL-000012A4-0000
Error at TriggerAbort: invoked
----------------------------------------------
2008/02/21 09:38:16 pid(5216) tid(5740)
Error on Global\TDPSQL-000012A4-0000
Error at TriggerAbort: invoked
----------------------------------------------
2008/02/21 09:38:16 pid(5216) tid(5740)
Error on Global\TDPSQL-000012A4-0000
Error at TriggerAbort: invoked
----------------------------------------------
2008/02/21 09:38:16 pid(5216) tid(5740)
Error on Global\TDPSQL-000012A4-0000
Error at SVDS::CloseDevice: Abort detected
----------------------------------------------
2008/02/21 09:38:16 pid(5216) tid(5740)
Error on Global\TDPSQL-000012A4-0000
Error at TriggerAbort: invoked
----------------------------------------------
2008/02/21 09:38:16 pid(5216) tid(9204)
Error on Global\TDPSQL-000012A4-0000
Error at TriggerAbort: invoked
----------------------------------------------
2008/02/21 09:38:16 pid(4772) tid(9308)
Error on Global\TDPSQL-000012A4-0000
Error at TriggerAbort: invoked
----------------------------------------------
2008/02/21 09:38:21 pid(4772) tid(10080)
Error on Global\TDPSQL-000012A4-0000
Error at TriggerAbort: invoked
----------------------------------------------
2008/02/21 11:42:08 pid(4344) tid(9080)
Error on Global\TDPSQL-000010F8-0000
Error at TriggerAbort: invoked
----------------------------------------------
2008/02/21 11:42:08 pid(5216) tid(5740)
Error on Global\TDPSQL-000010F8-0000
Error at TriggerAbort: invoked
----------------------------------------------
2008/02/21 11:42:08 pid(5216) tid(5740)
Error on Global\TDPSQL-000010F8-0000
Error at TriggerAbort: invoked
----------------------------------------------
2008/02/21 11:42:08 pid(5216) tid(5740)
Error on Global\TDPSQL-000010F8-0000
Error at TriggerAbort: invoked
----------------------------------------------
2008/02/21 11:42:08 pid(5216) tid(5740)
Error on Global\TDPSQL-000010F8-0000
Error at TriggerAbort: invoked
----------------------------------------------
2008/02/21 11:42:08 pid(5216) tid(5740)
Error on Global\TDPSQL-000010F8-0000
Error at TriggerAbort: invoked
----------------------------------------------
2008/02/21 11:42:08 pid(5216) tid(5740)
Error on Global\TDPSQL-000010F8-0000
Error at TriggerAbort: invoked
----------------------------------------------
2008/02/21 11:42:08 pid(5216) tid(5740)
Error on Global\TDPSQL-000010F8-0000
Error at TriggerAbort: invoked
----------------------------------------------
2008/02/21 11:42:08 pid(5216) tid(5740)
Error on Global\TDPSQL-000010F8-0000
Error at TriggerAbort: invoked
----------------------------------------------
2008/02/21 11:42:08 pid(5216) tid(5740)
Error on Global\TDPSQL-000010F8-0000
Error at SVDS::CloseDevice: Abort detected
----------------------------------------------
2008/02/21 11:42:08 pid(5216) tid(5740)
Error on Global\TDPSQL-000010F8-0000
Error at TriggerAbort: invoked
----------------------------------------------
2008/02/21 11:42:08 pid(5216) tid(6148)
Error on Global\TDPSQL-000010F8-0000
Error at TriggerAbort: invoked
----------------------------------------------
2008/02/21 11:42:09 pid(4344) tid(9704)
Error on Global\TDPSQL-000010F8-0000
Error at TriggerAbort: invoked
Any help would be a much appreciated.

Chris
 
Last edited by a moderator:
MB thanks for your reply. Sorry for getting back to you so late I was under the weather. If the restore procedure (using same data set) works on the DRP server using the same TSM and TDP client levels would this not negate moving to a higher version? You mention TDP patches are there any that deal with memory leaking buffers or vdi files?

Thanks...Christian
 
Higher versions of TDP are not necessarily patches. Most of them do fix some issues and that is why IBM always recommends to apply them. Going to higher versions will not break any previous functionality.
 
Is this a SAN based restore (is Storage Agent involved)?

I have seen issues like this when a certain set of condtions exist between the onsite and offsite copy of the file.. something about the bitfile number. Disabling the storage agent and using the network for transport allowed the restore to complete. I am not sure if the bitfile thing is fixed by your TSM server version or not. It was supposed to be fixed by my version (5.3.3.1) but I have successfully restored across network when restores across SAN were not working. As 5.3 is getting ready to go out of support, I didn't bother with support on the last few times that worked.

Good luck.
 
Moon-buddy, Thanks for your response. I have been working with our TSM guy and the latest release is not approved for the version of TSM server we are on. Can you point me in the direction of the list of fixes for each level of TDP?

Tsmtodd,
Thanks for your response. I am told we are using a network restore and don't use the storage agent (LAN free). Our SQL box is configured to use a Gigabit LAN connection that eventually writes to SAN disk. I am not sure what you mean by "onsite and offsite copy of the file.. something about the bitfile number." can you explain more particularly this bitfile number?

Chris
 
Are you sur you set the same number of Stipes for the restore operation as you used for backup
 
May, thanks for your help. I was able to restore a 20 GB database successfully and now in the process of restoring 500 GB database. I will keep my fingers crossed. I will let you know the outcome. Cheers...Chris
 
ChrisJ,
I wish I was smart enough to explain the bitfile stuff. As far as I know, it is like the key record for an object in the database. And, from what I can recall about this particular bug, if the bitfile number for the offsite copy of the object was greater than the bitfile number for the local copy of the object, the storage agent restore would not work. Again, I was told that bug was fixed prior to our versions, but I still see similar behavior occaisionally (storage agent based restore would not work when a network restore would work).

Because you don't use storage agent, it doesn't sound like your issue though.
 
May, I changed to 2 stripes for the restore and it was able to restore a 20 GB database but it failed on a 480 GB database shortly after the container was built. Do you have any suggestions? Cheers...Chris
 
did it failed with the same errors ?
if so, have a look on the script who launch the backup to know how many stripes are used for te backup??

unless for big database restore you need to increse the "Commetimeout" parameter on the TSm server,
'cause the dealing between the client/server can take more than one hour....
 
May,

We had a look at the script and it doesn't specify the # of stripes, it just takes default which is 1. Our Commetimeout value is set to 8 hours so I don't think that is an issue. Do you have any other suggestions? Thanks in advance for your help. Cheers...Chris
 
We found a temporary solution by changing the default setting for the Transaction Buffer Size from 25600KB to 1024KB within the TSM client config. This value matches the SQL buffer size value of 1024 KB. We suspect something on the OS particularly in the TCP settings has something to do with this limitation problem. I will post more info when we find it. Thanks for everyone's help. Cheers...Chris
 
Changing the Transaction Buffer Size did not work for our large database. We currently have a case open with IBM and MS. It looks the TSM server is sending data to 2 different nics (1 TSM gigabit card and 1 set of teamed 100 MB cards). We are not sure if the cluster is confused about traffic coming from the TSM gigabit card instead of the teamed nics. At this point it looks like a routing issue. When we get to root cause I will post it. Thanks.
 
The root cause of our restore problem turned out to be a corrupted pagefile.sys caused by AV scanning. The pagefile.sys file was not excluded from being scanned by AV. We disabled AV (which later we enabled with the correct exclusions in place) and recreated pagefile.sys on the SQL server.
 
what a mess...
never made relation with pagefile....

Who guide you on that solution? IBM?
 
Actually sorry the pagefile.sys wasn't the root cause. It turns out that it doesn't scan windows protected files regardless of the drive they are on. It was because we disabled McAfee 8.5 services and rebooted. We were able to get the restore to work by backdating McAfee to 8.0. (This we discovered ourselves). The whole issue has been a long and arduous process but hey I have learned a lot. :) The latest news is we are able to get the restore to work using McAfee 8.5 only when the TDI drivers are disabled. Our AV team has a case open with McAfee and hopefully we will get a fix soon. Thanks for your help.
 
TDP SQL restore hangs on

Hi to all TSM experts.
Can anybody please help me find a solution for my problem.
I'm trying to restore a single database with 1 TB capacity with TDP SQL via LAN on different server from the one that is backed up, and it hangs on for around an hour and the restore never starts till I cancel manually.I have tested restore for smaller database(around 100 MB ) with the same procedure via GUI TDP SQL and the restore works perfect via LAN.Can you please help me find a solution for my problem.
My TSM server is 5.5 and TDP for SQL is 5.5 version.The server is installed on a Red Hat 5 Linux.

Thanks.
 
Back
Top