ADSM-L

[ADSM-L] Exchage Agent Restore failure...

2007-06-15 15:42:38
Subject: [ADSM-L] Exchage Agent Restore failure...
From: "Allen S. Rout" <asr AT UFL DOT EDU>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Fri, 15 Jun 2007 15:40:29 -0400
I've got a client who's failing to restore an Exchange store, and he's
not really getting any suggestive errors; this kind of surprises me,
I'm used to the problem being pretty nicely pointed to, one way or
another.

Windows Server 2003 R2 SP1
Exchange agent Version 5, Release 4, Level 0.2

My TSM server is Version 5, Release 3, Level 4.0


The full transfers neatly, and then when he starts working on applying
incrementals, the following happens:

He sees (e.g.)

06/14/2007 16:01:22 ANS1017E Session rejected: TCP/IP connection failure

while I see (these are the enclosing lines which mention the target
server)

06/14/07   15:47:22     ANE4991I (Session: 28805, Node: 
EXCHANGE-IS.HOUSING.UFL.EDU)  TDP MSExchg ACN3506 Data Protection for Exchange: 
Starting full restore of storage group Housing Storage Group 1 to serv
er EXCHANGETEST.(SESSION: 28805)

[...]

06/14/07   15:59:42     ANR0481W Session 28805 for node 
EXCHANGE-IS.HOUSING.UFL.EDU (TDP MSExchg) terminated - client did not respond 
within 60 seconds. (SESSION: 28805)

[...]


06/14/07   16:01:23     ANR0406I Session 28839 started for node 
EXCHANGE-IS.HOUSING.UFL.EDU (TDP MSExchg) (Tcp/Ip 
x47-215.housing.ufl.edu(8801)). (SESSION: 28839)
06/14/07   16:01:23     ANE4991I (Session: 28839, Node: 
EXCHANGE-IS.HOUSING.UFL.EDU)  TDP MSExchg ACN3506 Data Protection for Exchange: 
Starting incr restore of storage group Housing Storage Group 1 to server 
EXCHANGETEST.(SESSION: 28839)
06/14/07   16:01:23     ANE4993E (Session: 28839, Node: 
EXCHANGE-IS.HOUSING.UFL.EDU)  TDP MSExchg ACN3508 Data Protection for Exchange: 
incr restore of storage group Housing Storage Group 1 to server EXCHANGETEST 
failed, rc = 425.(SESSION: 28839)

And then there are another 50-100 iterations of the "Starting incr
restore / failed with rc = 425"


The rc=425 doesn't appear to correlate with either an API return code
or an agent code.  The 3508 doesn't enlighten much, sounds like it may
be a RC from the exchange server?

In any case, I don't have any rejected connections to the box, so it
looks to him like I rejected, and looks to me like he went away.

I'm thinking that the connect failure is in fact a delayed message
from whenever the exchange client got busy (say at 15:58:42, 60
seconds before the termination server message) and a few minutes later
the agent gets around to sending packets down that pipe, and finds a
failure.

I understand the agents to be notoriously bad at picking up the
connections, and googling around found similar errors to be well
addressed by increasing COMMTIMEOUT on the server.  I increased my
COMMTIMEOUT to a few hours, to be sure I wasn't blowing him off
unnecessarily, and he started failing with a

06/15/07   11:51:45     ANR0480W Session 31503 for node 
EXCHANGE-IS.HOUSING.UFL.EDU (TDP MSExchg) terminated - connection with client 
severed. (SESSION: 31503)

at "about" the same point in the process. He reconnects, and then

06/15/07   11:51:45     ANR0406I Session 31845 started for node 
EXCHANGE-IS.HOUSING.UFL.EDU (TDP MSExchg) (Tcp/Ip 
x47-215.housing.ufl.edu(2432)). (SESSION: 31845)
06/15/07   11:51:45     ANE4991I (Session: 31845, Node: 
EXCHANGE-IS.HOUSING.UFL.EDU)  TDP MSExchg ACN3506 Data Protection for Exchange: 
Starting incr restore of storage group Housing Storage Group 1 to server 
EXCHANGETEST.(SESSION: 31845)
06/15/07   11:51:45     ANE4993E (Session: 31845, Node: 
EXCHANGE-IS.HOUSING.UFL.EDU)  TDP MSExchg ACN3508 Data Protection for Exchange: 
incr restore of storage group Housing Storage Group 1 to server EXCHANGETEST 
failed, rc = 425.(SESSION: 31845)


I don't think there's an option analogous to COMMTIMEOUT for the
client side, am I missing something?  I'm gearing up to open a ticket,
but I figured I'd send up a flare Just In Case.


- Allen S. Rout

<Prev in Thread] Current Thread [Next in Thread>