ADSM-L

Re: please help - ANR0918E

2003-09-24 10:51:39
Subject: Re: please help - ANR0918E
From: Bill Boyer <bill.boyer AT VERIZON DOT NET>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Wed, 24 Sep 2003 10:51:03 -0400
Yes, it is in the Windows2003 TSM 5.2 client code. It utilizes the new
shadow copy services. But you need a TSM 5.2 server to take advantage of it.

>From a previous post by Andy Raibeck:

In addition, the Windows 2003 system state/service backups use a different
transaction protocol that doesn't pin the server recovery log for
extensive periods of time, as might the "system object" backup method.
This support required changes on the server side as well, and thus the
requirement for a 5.2 server.



Bill Boyer
DSS, Inc.


-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU]On Behalf Of
Rushforth, Tim
Sent: Wednesday, September 24, 2003 10:35 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: please help - ANR0918E


Hi Chris:



The format of the option is "RESOURCETIMEOUT 180" and is placed in
dsmserv.opt. You can issue q opt res* to see your current setting.  Note
that this didn't really help us at all.



We eventually split up our TSM server into 2 servers (not just because of
this) and we haven't had any problems since!  We eventually got IBM to open
an APAR on this - IC36769.



I am also sending this mail to the list so other people have this info.



This is the response from IBM in our open PMR (42130):



Action taken: I got answer from client developers that they have fix for

that in 2003 but still working on fix for 2000 clients. That will be

done in some of next releases of code.

>From developers

"The problem is locking on the server, the length of time the locks are

held and the resourcewait setting on the server.  This is not a problem

the server is able to resolve.  The server is working correctly.  The

system files is a long running transaction.  If another session needs to

lock the system object filespace while the system files transaction is

being committed, and that transaction takes a very long time (longer

than the resourcewait time) to commit, then this situation occurs and

there is nothing the server can do about it because the server is doing

everything correctly.

.

The long term solution is when the client is finally updated to process

the system files in multiple transactions rather than in a single

transaction.  When that update is made then there will no longer be a

transaction with tens or hundreds of thousands of files in a single

transaction causing this problem.  At the current time the transaction

commit for system files can take hours because of the number of files

involved in the single transaction.  Note, the problem is ONLY with the

commit time, not the length of the entire transaction since the locks

are only grabbed after the data movement, during end transaction

processing, to limit the length of time locks are held.  Until that

update can be made by the client team the only other possible fix is for

the backup of the SYSTEM OBJECTS filespace to be single threaded.

.

Again, I want to make it clear that this problem is not caused by the

server improperly handling something.  The server is properly handling

the backup and the server is properly terminating the backup because of

the length of time being waited on a lock caused by the length of time

it takes to commit the transaction of the system files."

.

" Jim Smith created a work item (Id:JSMH-5BURL4 Abstract:"Cross-txn

grouping for system object") sometimes ago to address this problem. This

problem is solved for Windows 2003 VSS work using the new grouping and I

think the same will be done for Windows 2000/XP.

-----------------------



I've just searched on this APAR and it is now closed as a suggestion for
future release so don't expect a fix soon!




APAR status


Closed as suggestion for future release.


Error description




The TSM backup of a Windows system object runs as a single

transaction. Because the backup of the system object can

take quite a long time, due to the number of physical

objects that make up the system object, the backup transaction

can hold locks on the TSM server for a very long time. In a

multithreaded client enviroment other client threads for this

same node may end up having their transaction time out waiting

for the lock(s) held by the system object transaction. When

this occurs the following messages are seen:

ANR0538I A resource waiter has been aborted.

ANR0918E Inventory Query Backup for node ABC terminated - lock

         conflict

While neither client nor server code logic is in error here,

a modification to the transaction processing of system

objects should be made to avoid terminating other client

sessions associated with a muilt-threaded (mult-session)

backup.


Local fix




1 - Do not include system objects in the normal backup.

They can be excluded by:

  Using the domain statement: DOMAIN ALL-LOCAL -SYSTEMOBJECTS

  Or Using the exclude statement: EXCLUDE.SYSTEMOBJECT SYSFILES

2 - Backing up the system objects later using dsmc -optfile=xxxx

where the optfile has a resourceutilization set to 1 so that

the backup is single threaded.







Tim Rushforth

City of Winnipeg



-----Original Message-----
From: Rees, Chris ( Corp ) [mailto:Chris.Rees AT pgen DOT com]
Sent: September 24, 2003 3:38 AM
To: TRushforth AT WINNIPEG DOT CA
Subject: please help - ANR0918E



Hi Tim



Hope you don't mind me emailing you directly. !



Just wondered if you got this sorted. I found the thread below on adsm.org.
We are having exactly the same problems, i.e lock conflict and w2k backup
sessions hanging.



I am willing to change resource timeout but can't see it in dsmserv.opt.
Where do you change it?



Any help greatly appreciated



Regards



Chris







Forum:   ADSM.ORG - ADSM / TSM Mailing List Archive
 Date:      May 20, 15:57
 From:      Rushforth, Tim < <mailto:TRushforth AT WINNIPEG DOT CA>
TRushforth AT WINNIPEG DOT CA>


Hi Geoff:
Have you had any resolution to this problem?  We've had a few occurences of
this now - when it happens to us all sessions basically seem hung up in a
run state - I believe most are w2k clients at the point of backing up system
objects - other nodes (eg Exchange backups) are still processing data.  We
are now at 5.1.6.4 Server and mostly 5.1.1.1 and 5.1.1.3 clients.
Thanks,
Tim Rushforth
City of Winnipeg
>There have been some problems with resource waiters and locks.
> It might be worth upping your server Resource Timeout value to 100
 Support asked me to change it again, from 60 to 90 this time, so I went
ahead and made it 100. I'm still having problems with Failed backups
reporting error ANR0918E. It's a random thing and although the clients are
mostly 4.1, and they tell me they won't look at them, I lucked out, or NOT,
and have it showing up on 5.1 clients too.
 10/31/02 23:51:22 ANR0918E Inventory Query Backup for node XXXXXXX
terminated - lock conflict.
Geoff Gill TSM Administrator NT Systems Support Engineer SAIC






___________________________ Disclaimer Notice __________________________
This message and any attachments are confidential and should only be read
by those to whom they are addressed. If you are not the intended recipient,
please contact us, delete the message from your computer and destroy any
copies. Any distribution or copying without our prior permission is
prohibited.

Internet communications are not always secure and therefore the Powergen
Group does not accept legal responsibility for this message. The recipient
is responsible for verifying its authenticity before acting on the
contents. Any views or opinions presented are solely those of the author
and do not necessarily represent those of the Powergen Group.

Registered addresses:

Powergen UK plc, 53 New Broad Street, London, EC2M 1SL
Registered in England & Wales No. 2366970

Powergen Retail Limited, Westwood Way, Westwood Business Park, Coventry CV4
8LG.
Registered in England and Wales No: 3407430

Telephone +44 (0) 2476 42 4000
Fax +44 (0) 2476 42 5432

<Prev in Thread] Current Thread [Next in Thread>