1. Forum Rules (PLEASE CLICK HERE TO READ BEFORE POSTING) Click the link to access ADSM.ORG Acceptable Use Policy and forum rules which should be observed when using this website. Violators may be banned from this website. This message will disappear after you have made at least 12 posts. Thank you for your cooperation.

TDP Restore Problem

Discussion in 'TSM Database' started by ChrisJ, Feb 22, 2008.

  1. ChrisJ

    ChrisJ New Member

    Joined:
    Feb 22, 2008
    Messages:
    10
    Likes Received:
    0
    TSM/TDP guys help please, I haven't had any luck finding a solution to our problem. Can't restore SQL databases using TDP-the container gets built but once anywhere from 1 GB to 18 GB of data is populated it fails. We can restore using SQL exclusively and even restore databases to a DRP stand alone server via TDP and the same data set. Backups on the server work like a charm on the 2 problem servers. We have set the TSM Comm timeout setting to 8 hours. I have looked up rc=419 and it point to includes/excludes which seem to be defined correctly. Not sure what other info
    Code:
    SQL 2000 (Build 2187) and OS is win 2003
    TSM Server Version 5, Release 3, Level 3.0
    TDP client Version 5 Release 2 Level 1.04
    2 SQL servers are clustered
     
    Errors:
     
    TDP Errors:
    02/21/2008 09:38:16 ACO5436E A failure occurred on stripe number (0), rc = 428
    02/21/2008 09:38:16 ACO5407E The SQL server aborted the operation.
    02/21/2008 09:38:21 Restore of EBA failed.
    02/21/2008 09:38:21 ACO5407E The SQL server aborted the operation.
    02/21/2008 11:42:08 ACO5436E A failure occurred on stripe number (0), rc = 419
    02/21/2008 11:42:08 ACO0003S An internal processing error has occurred.
    02/21/2008 11:42:09 Restore of EBA failed.
    02/21/2008 11:42:09 ACO0003S An internal processing error has occurred.
     
     
      
    SQL Log Errors
    2008-02-21 09:38:16.54 spid240   BackupMedium::ReportIoError: read failure on backup device 'TDPSQL-000012A4-0000'. Operating system error 995(The I/O operation has been aborted because of either a thread exit or an application request.).
    2008-02-21 09:38:16.54 spid240   Internal I/O request 0x706BD800: Op: Read, pBuffer: 0x1AD40000, Size: 1048576, Position: 4865398272, UMS: Internal: 0x706BD798, InternalHigh: 0x0, Offset: 0x340480, OffsetHigh: 0x0, m_buf: 0x1AD40000, m_len: 0, m_actualBytes: 0, m_errcode: 995, BackupFile: TDPSQL-000012A4-0000
    2008-02-21 11:38:45.16 spid981   Unnamed tape (Family ID: 0x8b47c87e, sequence 1) mounted on tape drive 'TDPSQL-000010F8-0000'.
    2008-02-21 11:42:08.68 spid981   BackupMedium::ReportIoError: read failure on backup device 'TDPSQL-000010F8-0000'. Operating system error 1003(Cannot complete this function.).
    
    Event log errors
    Event Type: Warning
    Event Source: MSSQLSERVER
    Event Category: (2)
    Event ID: 17055
    Date:  2008/02/21
    Time:  9:34:29 AM
    User:  
    Computer: 
    Description:
    18227 :
    Unnamed tape (Family ID: 0x8b47c87e, sequence 1) mounted on tape drive 'TDPSQL-000012A4-0000'.
    
        
    Event Type: Error
    Event Source: MSSQLSERVER
    Event Category: (2)
    Event ID: 17055
    Date:  2008/02/21
    Time:  9:38:16 AM
    User:  
    Computer: 
    Description:
    18210 :
    BackupMedium::ReportIoError: read failure on backup device 'TDPSQL-000012A4-0000'. Operating system error 995(The I/O operation has been aborted because of either a thread exit or an application request.).
    
        
    Event Type: Warning
    Event Source: MSSQLSERVER
    Event Category: (2)
    Event ID: 17055
    Date:  2008/02/21
    Time:  11:38:45 AM
    User:  
    Computer: 
    Description:
    18227 :
    Unnamed tape (Family ID: 0x8b47c87e, sequence 1) mounted on tape drive 'TDPSQL-000010F8-0000'.
    
         
    Event Type: Error
    Event Source: MSSQLSERVER
    Event Category: (2)
    Event ID: 17055
    Date:  2008/02/21
    Time:  11:42:08 AM
    User:  
    Computer: 
    Description:
    18210 :
    BackupMedium::ReportIoError: read failure on backup device 'TDPSQL-000010F8-0000'. Operating system error 1003(Cannot complete this function.).
    
    
    VDI Errors
    ----------------------------------------------
    2008/02/21 09:38:16 pid(5216) tid(5740)
    Error on Global\TDPSQL-000012A4-0000
    Error at TriggerAbort: invoked
    ----------------------------------------------
    2008/02/21 09:38:16 pid(5216) tid(5740)
    Error on Global\TDPSQL-000012A4-0000
    Error at TriggerAbort: invoked
    ----------------------------------------------
    2008/02/21 09:38:16 pid(5216) tid(5740)
    Error on Global\TDPSQL-000012A4-0000
    Error at TriggerAbort: invoked
    ----------------------------------------------
    2008/02/21 09:38:16 pid(5216) tid(5740)
    Error on Global\TDPSQL-000012A4-0000
    Error at TriggerAbort: invoked
    ----------------------------------------------
    2008/02/21 09:38:16 pid(5216) tid(5740)
    Error on Global\TDPSQL-000012A4-0000
    Error at TriggerAbort: invoked
    ----------------------------------------------
    2008/02/21 09:38:16 pid(5216) tid(5740)
    Error on Global\TDPSQL-000012A4-0000
    Error at TriggerAbort: invoked
    ----------------------------------------------
    2008/02/21 09:38:16 pid(5216) tid(5740)
    Error on Global\TDPSQL-000012A4-0000
    Error at TriggerAbort: invoked
    ----------------------------------------------
    2008/02/21 09:38:16 pid(5216) tid(5740)
    Error on Global\TDPSQL-000012A4-0000
    Error at TriggerAbort: invoked
    ----------------------------------------------
    2008/02/21 09:38:16 pid(5216) tid(5740)
    Error on Global\TDPSQL-000012A4-0000
    Error at TriggerAbort: invoked
    ----------------------------------------------
    2008/02/21 09:38:16 pid(5216) tid(5740)
    Error on Global\TDPSQL-000012A4-0000
    Error at SVDS::CloseDevice: Abort detected
    ----------------------------------------------
    2008/02/21 09:38:16 pid(5216) tid(5740)
    Error on Global\TDPSQL-000012A4-0000
    Error at TriggerAbort: invoked
    ----------------------------------------------
    2008/02/21 09:38:16 pid(5216) tid(9204)
    Error on Global\TDPSQL-000012A4-0000
    Error at TriggerAbort: invoked
    ----------------------------------------------
    2008/02/21 09:38:16 pid(4772) tid(9308)
    Error on Global\TDPSQL-000012A4-0000
    Error at TriggerAbort: invoked
    ----------------------------------------------
    2008/02/21 09:38:21 pid(4772) tid(10080)
    Error on Global\TDPSQL-000012A4-0000
    Error at TriggerAbort: invoked
    ----------------------------------------------
    2008/02/21 11:42:08 pid(4344) tid(9080)
    Error on Global\TDPSQL-000010F8-0000
    Error at TriggerAbort: invoked
    ----------------------------------------------
    2008/02/21 11:42:08 pid(5216) tid(5740)
    Error on Global\TDPSQL-000010F8-0000
    Error at TriggerAbort: invoked
    ----------------------------------------------
    2008/02/21 11:42:08 pid(5216) tid(5740)
    Error on Global\TDPSQL-000010F8-0000
    Error at TriggerAbort: invoked
    ----------------------------------------------
    2008/02/21 11:42:08 pid(5216) tid(5740)
    Error on Global\TDPSQL-000010F8-0000
    Error at TriggerAbort: invoked
    ----------------------------------------------
    2008/02/21 11:42:08 pid(5216) tid(5740)
    Error on Global\TDPSQL-000010F8-0000
    Error at TriggerAbort: invoked
    ----------------------------------------------
    2008/02/21 11:42:08 pid(5216) tid(5740)
    Error on Global\TDPSQL-000010F8-0000
    Error at TriggerAbort: invoked
    ----------------------------------------------
    2008/02/21 11:42:08 pid(5216) tid(5740)
    Error on Global\TDPSQL-000010F8-0000
    Error at TriggerAbort: invoked
    ----------------------------------------------
    2008/02/21 11:42:08 pid(5216) tid(5740)
    Error on Global\TDPSQL-000010F8-0000
    Error at TriggerAbort: invoked
    ----------------------------------------------
    2008/02/21 11:42:08 pid(5216) tid(5740)
    Error on Global\TDPSQL-000010F8-0000
    Error at TriggerAbort: invoked
    ----------------------------------------------
    2008/02/21 11:42:08 pid(5216) tid(5740)
    Error on Global\TDPSQL-000010F8-0000
    Error at SVDS::CloseDevice: Abort detected
    ----------------------------------------------
    2008/02/21 11:42:08 pid(5216) tid(5740)
    Error on Global\TDPSQL-000010F8-0000
    Error at TriggerAbort: invoked
    ----------------------------------------------
    2008/02/21 11:42:08 pid(5216) tid(6148)
    Error on Global\TDPSQL-000010F8-0000
    Error at TriggerAbort: invoked
    ----------------------------------------------
    2008/02/21 11:42:09 pid(4344) tid(9704)
    Error on Global\TDPSQL-000010F8-0000
    Error at TriggerAbort: invoked
    
    Any help would be a much appreciated.

    Chris
     
    Last edited by a moderator: Mar 11, 2008
  2.  
  3. moon-buddy

    moon-buddy Moderator

    Joined:
    Aug 24, 2005
    Messages:
    6,228
    Likes Received:
    279
    Occupation:
    Electronics Engineer, Security Professional
    Location:
    Somewhere in the US
    First, patch the TDP version you have or move to the next version and retry...
     
  4. ChrisJ

    ChrisJ New Member

    Joined:
    Feb 22, 2008
    Messages:
    10
    Likes Received:
    0
    MB thanks for your reply. Sorry for getting back to you so late I was under the weather. If the restore procedure (using same data set) works on the DRP server using the same TSM and TDP client levels would this not negate moving to a higher version? You mention TDP patches are there any that deal with memory leaking buffers or vdi files?

    Thanks...Christian
     
  5. moon-buddy

    moon-buddy Moderator

    Joined:
    Aug 24, 2005
    Messages:
    6,228
    Likes Received:
    279
    Occupation:
    Electronics Engineer, Security Professional
    Location:
    Somewhere in the US
    Higher versions of TDP are not necessarily patches. Most of them do fix some issues and that is why IBM always recommends to apply them. Going to higher versions will not break any previous functionality.
     
  6. tsmtodd

    tsmtodd New Member

    Joined:
    May 17, 2005
    Messages:
    79
    Likes Received:
    2
    Is this a SAN based restore (is Storage Agent involved)?

    I have seen issues like this when a certain set of condtions exist between the onsite and offsite copy of the file.. something about the bitfile number. Disabling the storage agent and using the network for transport allowed the restore to complete. I am not sure if the bitfile thing is fixed by your TSM server version or not. It was supposed to be fixed by my version (5.3.3.1) but I have successfully restored across network when restores across SAN were not working. As 5.3 is getting ready to go out of support, I didn't bother with support on the last few times that worked.

    Good luck.
     
  7. ChrisJ

    ChrisJ New Member

    Joined:
    Feb 22, 2008
    Messages:
    10
    Likes Received:
    0
    Moon-buddy, Thanks for your response. I have been working with our TSM guy and the latest release is not approved for the version of TSM server we are on. Can you point me in the direction of the list of fixes for each level of TDP?

    Tsmtodd,
    Thanks for your response. I am told we are using a network restore and don't use the storage agent (LAN free). Our SQL box is configured to use a Gigabit LAN connection that eventually writes to SAN disk. I am not sure what you mean by "onsite and offsite copy of the file.. something about the bitfile number." can you explain more particularly this bitfile number?

    Chris
     
  8. may

    may Senior Member

    Joined:
    Aug 5, 2004
    Messages:
    1,074
    Likes Received:
    0
    Location:
    PARIS
    Are you sur you set the same number of Stipes for the restore operation as you used for backup
     
  9. ChrisJ

    ChrisJ New Member

    Joined:
    Feb 22, 2008
    Messages:
    10
    Likes Received:
    0
    May, thanks for your help. I was able to restore a 20 GB database successfully and now in the process of restoring 500 GB database. I will keep my fingers crossed. I will let you know the outcome. Cheers...Chris
     
  10. tsmtodd

    tsmtodd New Member

    Joined:
    May 17, 2005
    Messages:
    79
    Likes Received:
    2
    ChrisJ,
    I wish I was smart enough to explain the bitfile stuff. As far as I know, it is like the key record for an object in the database. And, from what I can recall about this particular bug, if the bitfile number for the offsite copy of the object was greater than the bitfile number for the local copy of the object, the storage agent restore would not work. Again, I was told that bug was fixed prior to our versions, but I still see similar behavior occaisionally (storage agent based restore would not work when a network restore would work).

    Because you don't use storage agent, it doesn't sound like your issue though.
     
  11. ChrisJ

    ChrisJ New Member

    Joined:
    Feb 22, 2008
    Messages:
    10
    Likes Received:
    0
    May, I changed to 2 stripes for the restore and it was able to restore a 20 GB database but it failed on a 480 GB database shortly after the container was built. Do you have any suggestions? Cheers...Chris
     
  12. may

    may Senior Member

    Joined:
    Aug 5, 2004
    Messages:
    1,074
    Likes Received:
    0
    Location:
    PARIS
    did it failed with the same errors ?
    if so, have a look on the script who launch the backup to know how many stripes are used for te backup??

    unless for big database restore you need to increse the "Commetimeout" parameter on the TSm server,
    'cause the dealing between the client/server can take more than one hour....
     
  13. ChrisJ

    ChrisJ New Member

    Joined:
    Feb 22, 2008
    Messages:
    10
    Likes Received:
    0
    May,

    We had a look at the script and it doesn't specify the # of stripes, it just takes default which is 1. Our Commetimeout value is set to 8 hours so I don't think that is an issue. Do you have any other suggestions? Thanks in advance for your help. Cheers...Chris
     
  14. ChrisJ

    ChrisJ New Member

    Joined:
    Feb 22, 2008
    Messages:
    10
    Likes Received:
    0
    We found a temporary solution by changing the default setting for the Transaction Buffer Size from 25600KB to 1024KB within the TSM client config. This value matches the SQL buffer size value of 1024 KB. We suspect something on the OS particularly in the TCP settings has something to do with this limitation problem. I will post more info when we find it. Thanks for everyone's help. Cheers...Chris
     
  15. may

    may Senior Member

    Joined:
    Aug 5, 2004
    Messages:
    1,074
    Likes Received:
    0
    Location:
    PARIS
    so any news ???
     
  16. ChrisJ

    ChrisJ New Member

    Joined:
    Feb 22, 2008
    Messages:
    10
    Likes Received:
    0
    Changing the Transaction Buffer Size did not work for our large database. We currently have a case open with IBM and MS. It looks the TSM server is sending data to 2 different nics (1 TSM gigabit card and 1 set of teamed 100 MB cards). We are not sure if the cluster is confused about traffic coming from the TSM gigabit card instead of the teamed nics. At this point it looks like a routing issue. When we get to root cause I will post it. Thanks.
     
  17. ChrisJ

    ChrisJ New Member

    Joined:
    Feb 22, 2008
    Messages:
    10
    Likes Received:
    0
    The root cause of our restore problem turned out to be a corrupted pagefile.sys caused by AV scanning. The pagefile.sys file was not excluded from being scanned by AV. We disabled AV (which later we enabled with the correct exclusions in place) and recreated pagefile.sys on the SQL server.
     
  18. may

    may Senior Member

    Joined:
    Aug 5, 2004
    Messages:
    1,074
    Likes Received:
    0
    Location:
    PARIS
    what a mess...
    never made relation with pagefile....

    Who guide you on that solution? IBM?
     
  19. ChrisJ

    ChrisJ New Member

    Joined:
    Feb 22, 2008
    Messages:
    10
    Likes Received:
    0
    Actually sorry the pagefile.sys wasn't the root cause. It turns out that it doesn't scan windows protected files regardless of the drive they are on. It was because we disabled McAfee 8.5 services and rebooted. We were able to get the restore to work by backdating McAfee to 8.0. (This we discovered ourselves). The whole issue has been a long and arduous process but hey I have learned a lot. :) The latest news is we are able to get the restore to work using McAfee 8.5 only when the TDI drivers are disabled. Our AV team has a case open with McAfee and hopefully we will get a fix soon. Thanks for your help.
     
  20. tanoadriano

    tanoadriano New Member

    Joined:
    Jun 20, 2008
    Messages:
    12
    Likes Received:
    0
    TDP SQL restore hangs on

    Hi to all TSM experts.
    Can anybody please help me find a solution for my problem.
    I'm trying to restore a single database with 1 TB capacity with TDP SQL via LAN on different server from the one that is backed up, and it hangs on for around an hour and the restore never starts till I cancel manually.I have tested restore for smaller database(around 100 MB ) with the same procedure via GUI TDP SQL and the restore works perfect via LAN.Can you please help me find a solution for my problem.
    My TSM server is 5.5 and TDP for SQL is 5.5 version.The server is installed on a Red Hat 5 Linux.

    Thanks.
     

Share This Page