ADSM-L

Re: [ADSM-L] TDP for Exchange 2010 DAG full database backup experiencing intermittent failures

2013-07-25 09:33:12
Subject: Re: [ADSM-L] TDP for Exchange 2010 DAG full database backup experiencing intermittent failures
From: Del Hoobler <hoobler AT US.IBM DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Thu, 25 Jul 2013 09:31:12 -0400
Hi Steve,

Have you checked to make sure that none of the parallel backup sessions
are snapping the same volumes? If so, these cannot overlap at all.
Also, are you using multiple CAD/AGENTS or just one? Using multiple
could cause conflicts because they do not coordinate with each other.
You could try adding more time (30 minutes) between the launch of the
sessions.

The Event Log entries below do concern me as it seems that
VSS itself is having issues. If VSS does not clean up correctly,
it can block subsequent VSS operations.

If possible, reboot the server and take a look at this:

   http://technet.microsoft.com/en-us/library/ee264216%28WS.10%29.aspx

If you continue to have issues, a PMR is probably the next step.

Thanks,

Del

----------------------------------------------------

"ADSM: Dist Stor Manager" <ADSM-L AT vm.marist DOT edu> wrote on 07/25/2013
06:43:47 AM:

> From: "Schaub, Steve" <steve_schaub AT BCBST DOT COM>
> To: ADSM-L AT vm.marist DOT edu,
> Date: 07/25/2013 06:44 AM
> Subject: Re: TDP for Exchange 2010 DAG full database backup
> experiencing intermittent failures
> Sent by: "ADSM: Dist Stor Manager" <ADSM-L AT vm.marist DOT edu>
>
> Ray,
> I changed the timer to 11 minutes per your note, but still seeing
failures.
> Thanks,
> -steve
>
> -----Original Message-----
> From: Storer, Raymond [mailto:storerr AT nibco DOT com]
> Sent: Tuesday, July 23, 2013 1:24 PM
> To: ADSM: Dist Stor Manager
> Cc: Schaub, Steve
> Subject: RE: [ADSM-L] TDP for Exchange 2010 DAG full database backup
> experiencing intermittent failures
>
> Steve, according to the link below parallel backups are supported;
> however, you should allow a minimum of ten minutes between backups.
>
> http://pic.dhe.ibm.com/infocenter/tsminfo/v6r4/topic/
> com.ibm.itsm.mail.exc.doc/c_dpfcm_bup_vssplan_exc.html
>
>
> Ray Storer
> NIBCO INC.
>
> -----Original Message-----
> From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On
> Behalf Of Schaub, Steve
> Sent: Tuesday, July 23, 2013 11:27 AM
> To: ADSM-L AT VM.MARIST DOT EDU
> Subject: [ADSM-L] TDP for Exchange 2010 DAG full database backup
> experiencing intermittent failures
>
> Exchange 2010 w/DAG
> Tsm server 6.2.4
> Tsm client 6.4.0.0
> Tdp client 6.4.0.0
>
> The script that runs our nightly full backups is having intermittent
> failures.  Different servers, different databases each night, but
> all the failures have the same API error message.  The way the
> powershell script works on each server is to perform a full db
> backup of a different set of databases each night (so even though it
> runs nightly, they are actually weekly fulls).  The backups are run
> as Powershell jobs, in order to enable multiple backup concurrency
> (currently set to a max of 4 concurrent backup jobs).  The script
> waits 2 minutes after each job is submitted before running the next
> job in order to give the snapshot time to complete.  Some nights
> everything works, some nights 1 failure, some 2, 3, etc.  There are
> some VSS errors in the event log, but according to the TSM logs, the
> snapshot was successful?
>
> Are there any issues with running concurrent TDP backups on 6.4.0.0?
> Here are all the pertinent logs:
>
> tdperror.log
> 07/22/2013 20:17:38 ANS1235E An unknown system error has occurred
> from which TSM cannot recover.
> 07/22/2013 20:23:48 ANS1909E The scheduled command failed.
> 07/22/2013 20:23:48 ANS1512E Scheduled event 'EXCHANGE2010_FULL'
> failed.  Return code = 1.
>
> BkupExchange2010_Full_Day22.log
> IBM Tivoli Storage Manager for Mail:
> Data Protection for Microsoft Exchange Server Version 6, Release 4,
Level 0.0
> (C) Copyright IBM Corporation 1998, 2012. All rights reserved.
> ACN5057I The c:\program files\tivoli\tsm\TDPExchange\tdpexc.log log
> file has been pruned successfully.
> Querying Exchange Server to gather component information, please wait...
> Connecting to TSM Server as node 'BCMSG107_MAIL'...
> Connecting to Local DSM Agent 'BCMSG107'...
> Using backup node 'EXCHANGE2010'...
> Starting component backup...
> Beginning VSS backup of 'DB061' (DBCopy)...
> Snapshot operation completed with return code = 0.
> The following database is being backed up: 'DB061'. The data is
> being transferred to the Tivoli Storage Manager server.
> ACN5060E A Tivoli Storage Manager API error has occurred.
>
> tdpexc.log
> 07/22/2013 20:17:18 Snapshot operation completed with return code = 0.
> 07/22/2013 20:17:19 The following database is being backed up:
> 'DB076'. The data is being transferred to the Tivoli Storage Manager
server.
> 07/22/2013 20:17:43 ANS1235E (RC-1)   An unknown system error has
> occurred from which TSM cannot recover.
> 07/22/2013 20:17:43 ACN5060E A Tivoli Storage Manager API error has
occurred.
>
>
> dsmerror.log
> 07/22/2013 20:04:05 ANS2055I The local snapshot manager could not be
locked.
> 07/22/2013 20:04:05 ANS2056I Waiting maximal 600 seconds until the
> lock is released by the other application.
> 07/22/2013 20:05:06 ANS2055I The local snapshot manager could not be
locked.
> 07/22/2013 20:05:06 ANS2056I Waiting maximal 600 seconds until the
> lock is released by the other application.
> 07/22/2013 20:07:06 ANS2055I The local snapshot manager could not be
locked.
> 07/22/2013 20:07:06 ANS2056I Waiting maximal 600 seconds until the
> lock is released by the other application.
> 07/22/2013 20:07:27 ANS2055I The local snapshot manager could not be
locked.
> 07/22/2013 20:07:27 ANS2056I Waiting maximal 600 seconds until the
> lock is released by the other application.
> 07/22/2013 20:08:27 ANS2055I The local snapshot manager could not be
locked.
> 07/22/2013 20:08:27 ANS2056I Waiting maximal 600 seconds until the
> lock is released by the other application.
> 07/22/2013 20:08:32 ANS2055I The local snapshot manager could not be
locked.
> 07/22/2013 20:08:32 ANS2056I Waiting maximal 600 seconds until the
> lock is released by the other application.
> 07/22/2013 20:12:37 ANS2055I The local snapshot manager could not be
locked.
> 07/22/2013 20:12:37 ANS2056I Waiting maximal 600 seconds until the
> lock is released by the other application.
> 07/22/2013 20:17:18 ANS2055I The local snapshot manager could not be
locked.
> 07/22/2013 20:17:18 ANS2056I Waiting maximal 600 seconds until the
> lock is released by the other application.
> 07/22/2013 20:17:35 ANS2054E Operating system error 13: Permission
denied.
> 07/22/2013 20:17:35 ANS5250E An unexpected error was encountered.
>    TSM function name : CLocalPolicyManager::versionTSMRegisterBackup
>    TSM function      : Failed to lock LSM repository.
>    TSM return code   : 104
>    TSM file          : ..\..\common\lpm\lpm.cpp (2223)
> 07/22/2013 20:17:35 ANS5250E An unexpected error was encountered.
>    TSM function name : CLocalPolicyManager::lpmRegisterBackup
>    TSM function      : Error in versionBasedPolicyUsingTSM.
>    TSM return code   : -1
>    TSM file          : ..\..\common\lpm\lpm.cpp (834)
> 07/22/2013 20:17:35 ANS5250E An unexpected error was encountered.
>    TSM function name : vssCreateLocalBackup
>    TSM function      : Registering Backup with LPM failed
>    TSM return code   : -1
>    TSM file          : ..\..\common\winnt\vssback.cpp (6412)
> 07/22/2013 20:17:35 ANS5250E An unexpected error was encountered.
>    TSM function name : baCreateLocalBackup
>    TSM function      : VSS Create Local Backup Failed
>    TSM return code   : -1
>    TSM file          : ..\..\common\ba\backsnap.cpp (2323)
> 07/22/2013 20:17:35 ANS5250E An unexpected error was encountered.
>    TSM function name : baProcessRequest
>    TSM function      : VSS Create Local Backup failed
>    TSM return code   : -1
>    TSM file          : ..\..\common\ba\incrdrv.cpp (7154)
> 07/22/2013 20:17:38 ANS5283E The operation was unsuccessful.
> 07/22/2013 20:17:38 ANS2054E Operating system error 13: Permission
denied.
> 07/22/2013 20:17:38 ANS0361I DIAG: Release mutex failed; reason 288.
> 07/22/2013 20:17:38 ANS5250E An unexpected error was encountered.
>    TSM function name : CLocalPolicyManager::destroyManagers
>    TSM function      : Unlock local snapshot manager repository
>    TSM return code   : 101
>    TSM file          : ..\..\common\lpm\lpm.cpp (1238)
> 07/22/2013 20:17:44 ANS2055I The local snapshot manager could not be
locked.
> 07/22/2013 20:17:44 ANS2056I Waiting maximal 600 seconds until the
> lock is released by the other application.
>
> Application Event Log
>
> Log Name:      Application
> Source:        VSS
> Date:          7/22/2013 8:17:36 PM
> Event ID:      12347
> Task Category: None
> Level:         Error
> Keywords:      Classic
> User:          N/A
> Computer:      bcmsg107.bcbst.com
> Description:
> Volume Shadow Copy Service error: An internal inconsistency was
> detected in trying to contact shadow copy service writers.  The
> Registry Writer failed to respond to a query from VSS. Check to see
> that the Event Service and Volume Shadow Copy Service are operating
> properly, and please check the Application event log for any other
events.
> Operation:
>    Gathering Writer Data
>    Executing Asynchronous Operation
> Context:
>    Execution Context: Requestor
>    Current State: GatherWriterMetadata
> Operation:
>    Gathering Writer Data
>    Executing Asynchronous Operation
> Context:
>    Execution Context: Requestor
>    Current State: GatherWriterMetadata
>
> Log Name:      Application
> Source:        VSS
> Date:          7/22/2013 8:17:36 PM
> Event ID:      8193
> Task Category: None
> Level:         Error
> Keywords:      Classic
> User:          N/A
> Computer:      bcmsg107.bcbst.com
> Description:
> Volume Shadow Copy Service error: Unexpected error calling routine
> IVssAsync::QueryStatus.  hr = 0x80042318, An error was detected in
> the Volume Shadow Copy Service (VSS). The problem occurred while
> trying to contact VSS writers.
> Verify that the Event System service and the VSS service are running
> and check for associated errors in the event logs.
> Operation:
>    BackupComplete Event
>    Executing Asynchronous Operation
> Context:
>    Current State: BackupComplete
> Operation:
>    BackupComplete Event
>    Executing Asynchronous Operation
> Context:
>    Current State: BackupComplete
> -----------------------------------------------------
> Please see the following link for the BlueCross BlueShield of
> Tennessee E-mail disclaimer:  http://www.bcbst.com/email_disclaimer.shtm
>
>