ADSM-L

Re: [ADSM-L] TDP for Exchange 2010 DAG full database backup experiencing intermittent failures

2013-07-25 09:47:54
Subject: Re: [ADSM-L] TDP for Exchange 2010 DAG full database backup experiencing intermittent failures
From: "Sheridan, Peter T." <Peter.Sheridan AT CUNAMUTUAL DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Thu, 25 Jul 2013 13:45:32 +0000
Our company had the exact same issues and eventually we gave up and decided to 
not run them in parallel. We did decide to add the /skipintegritycheck flag to 
the backups which basically cut the backup times in half.  This flag, however, 
will not do any integrity checks but we were willing to take the risks.

We are also experiencing VSS issues from time to time and have not been able to 
find  a resolution. Another banging head against the wall for many months with 
no answers from either IBM or Microsoft support.  IBM support has been really 
good I must admit but getting help from the Microsoft side has been almost 
impossible.  Basically, if we have VSS issues we re-run the backups manually 
the next day.    

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT vm.marist DOT edu] On Behalf Of 
Del Hoobler
Sent: Thursday, July 25, 2013 8:31 AM
To: ADSM-L AT vm.marist DOT edu
Subject: Re: [ADSM-L] TDP for Exchange 2010 DAG full database backup 
experiencing intermittent failures

Hi Steve,

Have you checked to make sure that none of the parallel backup sessions are 
snapping the same volumes? If so, these cannot overlap at all.
Also, are you using multiple CAD/AGENTS or just one? Using multiple could cause 
conflicts because they do not coordinate with each other.
You could try adding more time (30 minutes) between the launch of the sessions.

The Event Log entries below do concern me as it seems that VSS itself is having 
issues. If VSS does not clean up correctly, it can block subsequent VSS 
operations.

If possible, reboot the server and take a look at this:

   http://technet.microsoft.com/en-us/library/ee264216%28WS.10%29.aspx

If you continue to have issues, a PMR is probably the next step.

Thanks,

Del

----------------------------------------------------

"ADSM: Dist Stor Manager" <ADSM-L AT vm.marist DOT edu> wrote on 07/25/2013
06:43:47 AM:

> From: "Schaub, Steve" <steve_schaub AT BCBST DOT COM>
> To: ADSM-L AT vm.marist DOT edu,
> Date: 07/25/2013 06:44 AM
> Subject: Re: TDP for Exchange 2010 DAG full database backup 
> experiencing intermittent failures Sent by: "ADSM: Dist Stor Manager" 
> <ADSM-L AT vm.marist DOT edu>
>
> Ray,
> I changed the timer to 11 minutes per your note, but still seeing
failures.
> Thanks,
> -steve
>
> -----Original Message-----
> From: Storer, Raymond [mailto:storerr AT nibco DOT com]
> Sent: Tuesday, July 23, 2013 1:24 PM
> To: ADSM: Dist Stor Manager
> Cc: Schaub, Steve
> Subject: RE: [ADSM-L] TDP for Exchange 2010 DAG full database backup 
> experiencing intermittent failures
>
> Steve, according to the link below parallel backups are supported; 
> however, you should allow a minimum of ten minutes between backups.
>
> http://pic.dhe.ibm.com/infocenter/tsminfo/v6r4/topic/
> com.ibm.itsm.mail.exc.doc/c_dpfcm_bup_vssplan_exc.html
>
>
> Ray Storer
> NIBCO INC.
>
> -----Original Message-----
> From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf 
> Of Schaub, Steve
> Sent: Tuesday, July 23, 2013 11:27 AM
> To: ADSM-L AT VM.MARIST DOT EDU
> Subject: [ADSM-L] TDP for Exchange 2010 DAG full database backup 
> experiencing intermittent failures
>
> Exchange 2010 w/DAG
> Tsm server 6.2.4
> Tsm client 6.4.0.0
> Tdp client 6.4.0.0
>
> The script that runs our nightly full backups is having intermittent 
> failures.  Different servers, different databases each night, but all 
> the failures have the same API error message.  The way the powershell 
> script works on each server is to perform a full db backup of a 
> different set of databases each night (so even though it runs nightly, 
> they are actually weekly fulls).  The backups are run as Powershell 
> jobs, in order to enable multiple backup concurrency (currently set to 
> a max of 4 concurrent backup jobs).  The script waits 2 minutes after 
> each job is submitted before running the next job in order to give the 
> snapshot time to complete.  Some nights everything works, some nights 
> 1 failure, some 2, 3, etc.  There are some VSS errors in the event 
> log, but according to the TSM logs, the snapshot was successful?
>
> Are there any issues with running concurrent TDP backups on 6.4.0.0?
> Here are all the pertinent logs:
>
> tdperror.log
> 07/22/2013 20:17:38 ANS1235E An unknown system error has occurred from 
> which TSM cannot recover.
> 07/22/2013 20:23:48 ANS1909E The scheduled command failed.
> 07/22/2013 20:23:48 ANS1512E Scheduled event 'EXCHANGE2010_FULL'
> failed.  Return code = 1.
>
> BkupExchange2010_Full_Day22.log
> IBM Tivoli Storage Manager for Mail:
> Data Protection for Microsoft Exchange Server Version 6, Release 4,
Level 0.0
> (C) Copyright IBM Corporation 1998, 2012. All rights reserved.
> ACN5057I The c:\program files\tivoli\tsm\TDPExchange\tdpexc.log log 
> file has been pruned successfully.
> Querying Exchange Server to gather component information, please wait...
> Connecting to TSM Server as node 'BCMSG107_MAIL'...
> Connecting to Local DSM Agent 'BCMSG107'...
> Using backup node 'EXCHANGE2010'...
> Starting component backup...
> Beginning VSS backup of 'DB061' (DBCopy)...
> Snapshot operation completed with return code = 0.
> The following database is being backed up: 'DB061'. The data is being 
> transferred to the Tivoli Storage Manager server.
> ACN5060E A Tivoli Storage Manager API error has occurred.
>
> tdpexc.log
> 07/22/2013 20:17:18 Snapshot operation completed with return code = 0.
> 07/22/2013 20:17:19 The following database is being backed up:
> 'DB076'. The data is being transferred to the Tivoli Storage Manager
server.
> 07/22/2013 20:17:43 ANS1235E (RC-1)   An unknown system error has
> occurred from which TSM cannot recover.
> 07/22/2013 20:17:43 ACN5060E A Tivoli Storage Manager API error has
occurred.
>
>
> dsmerror.log
> 07/22/2013 20:04:05 ANS2055I The local snapshot manager could not be
locked.
> 07/22/2013 20:04:05 ANS2056I Waiting maximal 600 seconds until the 
> lock is released by the other application.
> 07/22/2013 20:05:06 ANS2055I The local snapshot manager could not be
locked.
> 07/22/2013 20:05:06 ANS2056I Waiting maximal 600 seconds until the 
> lock is released by the other application.
> 07/22/2013 20:07:06 ANS2055I The local snapshot manager could not be
locked.
> 07/22/2013 20:07:06 ANS2056I Waiting maximal 600 seconds until the 
> lock is released by the other application.
> 07/22/2013 20:07:27 ANS2055I The local snapshot manager could not be
locked.
> 07/22/2013 20:07:27 ANS2056I Waiting maximal 600 seconds until the 
> lock is released by the other application.
> 07/22/2013 20:08:27 ANS2055I The local snapshot manager could not be
locked.
> 07/22/2013 20:08:27 ANS2056I Waiting maximal 600 seconds until the 
> lock is released by the other application.
> 07/22/2013 20:08:32 ANS2055I The local snapshot manager could not be
locked.
> 07/22/2013 20:08:32 ANS2056I Waiting maximal 600 seconds until the 
> lock is released by the other application.
> 07/22/2013 20:12:37 ANS2055I The local snapshot manager could not be
locked.
> 07/22/2013 20:12:37 ANS2056I Waiting maximal 600 seconds until the 
> lock is released by the other application.
> 07/22/2013 20:17:18 ANS2055I The local snapshot manager could not be
locked.
> 07/22/2013 20:17:18 ANS2056I Waiting maximal 600 seconds until the 
> lock is released by the other application.
> 07/22/2013 20:17:35 ANS2054E Operating system error 13: Permission
denied.
> 07/22/2013 20:17:35 ANS5250E An unexpected error was encountered.
>    TSM function name : CLocalPolicyManager::versionTSMRegisterBackup
>    TSM function      : Failed to lock LSM repository.
>    TSM return code   : 104
>    TSM file          : ..\..\common\lpm\lpm.cpp (2223)
> 07/22/2013 20:17:35 ANS5250E An unexpected error was encountered.
>    TSM function name : CLocalPolicyManager::lpmRegisterBackup
>    TSM function      : Error in versionBasedPolicyUsingTSM.
>    TSM return code   : -1
>    TSM file          : ..\..\common\lpm\lpm.cpp (834)
> 07/22/2013 20:17:35 ANS5250E An unexpected error was encountered.
>    TSM function name : vssCreateLocalBackup
>    TSM function      : Registering Backup with LPM failed
>    TSM return code   : -1
>    TSM file          : ..\..\common\winnt\vssback.cpp (6412)
> 07/22/2013 20:17:35 ANS5250E An unexpected error was encountered.
>    TSM function name : baCreateLocalBackup
>    TSM function      : VSS Create Local Backup Failed
>    TSM return code   : -1
>    TSM file          : ..\..\common\ba\backsnap.cpp (2323)
> 07/22/2013 20:17:35 ANS5250E An unexpected error was encountered.
>    TSM function name : baProcessRequest
>    TSM function      : VSS Create Local Backup failed
>    TSM return code   : -1
>    TSM file          : ..\..\common\ba\incrdrv.cpp (7154)
> 07/22/2013 20:17:38 ANS5283E The operation was unsuccessful.
> 07/22/2013 20:17:38 ANS2054E Operating system error 13: Permission
denied.
> 07/22/2013 20:17:38 ANS0361I DIAG: Release mutex failed; reason 288.
> 07/22/2013 20:17:38 ANS5250E An unexpected error was encountered.
>    TSM function name : CLocalPolicyManager::destroyManagers
>    TSM function      : Unlock local snapshot manager repository
>    TSM return code   : 101
>    TSM file          : ..\..\common\lpm\lpm.cpp (1238)
> 07/22/2013 20:17:44 ANS2055I The local snapshot manager could not be
locked.
> 07/22/2013 20:17:44 ANS2056I Waiting maximal 600 seconds until the 
> lock is released by the other application.
>
> Application Event Log
>
> Log Name:      Application
> Source:        VSS
> Date:          7/22/2013 8:17:36 PM
> Event ID:      12347
> Task Category: None
> Level:         Error
> Keywords:      Classic
> User:          N/A
> Computer:      bcmsg107.bcbst.com
> Description:
> Volume Shadow Copy Service error: An internal inconsistency was 
> detected in trying to contact shadow copy service writers.  The 
> Registry Writer failed to respond to a query from VSS. Check to see 
> that the Event Service and Volume Shadow Copy Service are operating 
> properly, and please check the Application event log for any other
events.
> Operation:
>    Gathering Writer Data
>    Executing Asynchronous Operation
> Context:
>    Execution Context: Requestor
>    Current State: GatherWriterMetadata
> Operation:
>    Gathering Writer Data
>    Executing Asynchronous Operation
> Context:
>    Execution Context: Requestor
>    Current State: GatherWriterMetadata
>
> Log Name:      Application
> Source:        VSS
> Date:          7/22/2013 8:17:36 PM
> Event ID:      8193
> Task Category: None
> Level:         Error
> Keywords:      Classic
> User:          N/A
> Computer:      bcmsg107.bcbst.com
> Description:
> Volume Shadow Copy Service error: Unexpected error calling routine 
> IVssAsync::QueryStatus.  hr = 0x80042318, An error was detected in the 
> Volume Shadow Copy Service (VSS). The problem occurred while trying to 
> contact VSS writers.
> Verify that the Event System service and the VSS service are running 
> and check for associated errors in the event logs.
> Operation:
>    BackupComplete Event
>    Executing Asynchronous Operation
> Context:
>    Current State: BackupComplete
> Operation:
>    BackupComplete Event
>    Executing Asynchronous Operation
> Context:
>    Current State: BackupComplete
> -----------------------------------------------------
> Please see the following link for the BlueCross BlueShield of 
> Tennessee E-mail disclaimer:  
> http://www.bcbst.com/email_disclaimer.shtm
>
>