Yes, it is in the Windows2003 TSM 5.2 client code. It utilizes the new
shadow copy services. But you need a TSM 5.2 server to take advantage of it.
>From a previous post by Andy Raibeck:
In addition, the Windows 2003 system state/service backups use a different
transaction protocol that doesn't pin the server recovery log for
extensive periods of time, as might the "system object" backup method.
This support required changes on the server side as well, and thus the
requirement for a 5.2 server.
Bill Boyer
DSS, Inc.
-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU]On Behalf Of
Rushforth, Tim
Sent: Wednesday, September 24, 2003 10:35 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: please help - ANR0918E
Hi Chris:
The format of the option is "RESOURCETIMEOUT 180" and is placed in
dsmserv.opt. You can issue q opt res* to see your current setting. Note
that this didn't really help us at all.
We eventually split up our TSM server into 2 servers (not just because of
this) and we haven't had any problems since! We eventually got IBM to open
an APAR on this - IC36769.
I am also sending this mail to the list so other people have this info.
This is the response from IBM in our open PMR (42130):
Action taken: I got answer from client developers that they have fix for
that in 2003 but still working on fix for 2000 clients. That will be
done in some of next releases of code.
>From developers
"The problem is locking on the server, the length of time the locks are
held and the resourcewait setting on the server. This is not a problem
the server is able to resolve. The server is working correctly. The
system files is a long running transaction. If another session needs to
lock the system object filespace while the system files transaction is
being committed, and that transaction takes a very long time (longer
than the resourcewait time) to commit, then this situation occurs and
there is nothing the server can do about it because the server is doing
everything correctly.
.
The long term solution is when the client is finally updated to process
the system files in multiple transactions rather than in a single
transaction. When that update is made then there will no longer be a
transaction with tens or hundreds of thousands of files in a single
transaction causing this problem. At the current time the transaction
commit for system files can take hours because of the number of files
involved in the single transaction. Note, the problem is ONLY with the
commit time, not the length of the entire transaction since the locks
are only grabbed after the data movement, during end transaction
processing, to limit the length of time locks are held. Until that
update can be made by the client team the only other possible fix is for
the backup of the SYSTEM OBJECTS filespace to be single threaded.
.
Again, I want to make it clear that this problem is not caused by the
server improperly handling something. The server is properly handling
the backup and the server is properly terminating the backup because of
the length of time being waited on a lock caused by the length of time
it takes to commit the transaction of the system files."
.
" Jim Smith created a work item (Id:JSMH-5BURL4 Abstract:"Cross-txn
grouping for system object") sometimes ago to address this problem. This
problem is solved for Windows 2003 VSS work using the new grouping and I
think the same will be done for Windows 2000/XP.
-----------------------
I've just searched on this APAR and it is now closed as a suggestion for
future release so don't expect a fix soon!
APAR status
Closed as suggestion for future release.
Error description
The TSM backup of a Windows system object runs as a single
transaction. Because the backup of the system object can
take quite a long time, due to the number of physical
objects that make up the system object, the backup transaction
can hold locks on the TSM server for a very long time. In a
multithreaded client enviroment other client threads for this
same node may end up having their transaction time out waiting
for the lock(s) held by the system object transaction. When
this occurs the following messages are seen:
ANR0538I A resource waiter has been aborted.
ANR0918E Inventory Query Backup for node ABC terminated - lock
conflict
While neither client nor server code logic is in error here,
a modification to the transaction processing of system
objects should be made to avoid terminating other client
sessions associated with a muilt-threaded (mult-session)
backup.
Local fix
1 - Do not include system objects in the normal backup.
They can be excluded by:
Using the domain statement: DOMAIN ALL-LOCAL -SYSTEMOBJECTS
Or Using the exclude statement: EXCLUDE.SYSTEMOBJECT SYSFILES
2 - Backing up the system objects later using dsmc -optfile=xxxx
where the optfile has a resourceutilization set to 1 so that
the backup is single threaded.
Tim Rushforth
City of Winnipeg
-----Original Message-----
From: Rees, Chris ( Corp ) [mailto:Chris.Rees AT pgen DOT com]
Sent: September 24, 2003 3:38 AM
To: TRushforth AT WINNIPEG DOT CA
Subject: please help - ANR0918E
Hi Tim
Hope you don't mind me emailing you directly. !
Just wondered if you got this sorted. I found the thread below on adsm.org.
We are having exactly the same problems, i.e lock conflict and w2k backup
sessions hanging.
I am willing to change resource timeout but can't see it in dsmserv.opt.
Where do you change it?
Any help greatly appreciated
Regards
Chris
Forum: ADSM.ORG - ADSM / TSM Mailing List Archive
Date: May 20, 15:57
From: Rushforth, Tim < <mailto:TRushforth AT WINNIPEG DOT CA>
TRushforth AT WINNIPEG DOT CA>
Hi Geoff:
Have you had any resolution to this problem? We've had a few occurences of
this now - when it happens to us all sessions basically seem hung up in a
run state - I believe most are w2k clients at the point of backing up system
objects - other nodes (eg Exchange backups) are still processing data. We
are now at 5.1.6.4 Server and mostly 5.1.1.1 and 5.1.1.3 clients.
Thanks,
Tim Rushforth
City of Winnipeg
>There have been some problems with resource waiters and locks.
> It might be worth upping your server Resource Timeout value to 100
Support asked me to change it again, from 60 to 90 this time, so I went
ahead and made it 100. I'm still having problems with Failed backups
reporting error ANR0918E. It's a random thing and although the clients are
mostly 4.1, and they tell me they won't look at them, I lucked out, or NOT,
and have it showing up on 5.1 clients too.
10/31/02 23:51:22 ANR0918E Inventory Query Backup for node XXXXXXX
terminated - lock conflict.
Geoff Gill TSM Administrator NT Systems Support Engineer SAIC
___________________________ Disclaimer Notice __________________________
This message and any attachments are confidential and should only be read
by those to whom they are addressed. If you are not the intended recipient,
please contact us, delete the message from your computer and destroy any
copies. Any distribution or copying without our prior permission is
prohibited.
Internet communications are not always secure and therefore the Powergen
Group does not accept legal responsibility for this message. The recipient
is responsible for verifying its authenticity before acting on the
contents. Any views or opinions presented are solely those of the author
and do not necessarily represent those of the Powergen Group.
Registered addresses:
Powergen UK plc, 53 New Broad Street, London, EC2M 1SL
Registered in England & Wales No. 2366970
Powergen Retail Limited, Westwood Way, Westwood Business Park, Coventry CV4
8LG.
Registered in England and Wales No: 3407430
Telephone +44 (0) 2476 42 4000
Fax +44 (0) 2476 42 5432
|