Results 1 to 5 of 5
-
10-03-2011, 05:56 AM #1Member
- Join Date
- May 2009
- Posts
- 42
- Thanks
- 2
- Thanked 0 Times in 0 Posts
When will Identify Duplicates process end???
Hi,
I've recently (2 1/2 weeks ago to be precise!!!!) enabled server side Data Deduplication on one of my storage pools and I thought it was due to end this weekend. I ran an SQL query to get the total num of files stored in the storage pool and compared the result with the Total Files Processed in the Identify Duplicates process. It was 18.1million files processed on Friday with roughly 19million total files in the storage pool.
Alas today the Identify Duplicates process is on 21.1million files??? Now I'm confused. When will this process end? Is this a how long is a piece of string scenario?
Can anybody shed some light on this?
Thanks,
Wesley
-
10-03-2011, 06:43 AM #2Moderator
- Join Date
- Dec 2003
- Location
- Czech Republic
- Posts
- 2,050
- Thanks
- 2
- Thanked 56 Times in 55 Posts
Hi,
identify process normally never ends - it stays there with status "IDLE" - can this be your case?
Harry
-
The Following User Says Thank You to Harry_Redl For This Useful Post:
csops (10-12-2011)
-
10-03-2011, 07:13 AM #3Member
- Join Date
- May 2009
- Posts
- 42
- Thanks
- 2
- Thanked 0 Times in 0 Posts
Hi Harry,
It's still active in my case. I'm trying to figure out why it has processed more files than exist in the Storage Pool.
I've a hunch that the figure reported in the Identify Duplicate Process output is a cumulative of all files over the last few weeks which have been expired + the new files have entered the storage pool which will result in more files than currently exist in the storage pool.
-
10-03-2011, 10:53 AM #4Member
- Join Date
- Sep 2002
- Posts
- 6
- Thanks
- 0
- Thanked 2 Times in 2 Posts
Hi Wesley,
I believe your hunch is correct, as your identify process is ongoing. it reports cumulative, as you say, including expired files.
Instead you can schedule the Identify process to run on specific period during the day:
update stgpool <your_stgpool> idendifyprocess=0
define schedule <your_admin_schedule_name> t=a starttime=<time> cmd="identify duplicates <your_stgpool> numprocess=<number> duration=<minutes>" active=yes
(or create script and let the schedule run the script)
Here are some links you can read:
https://www-304.ibm.com/support/docv...21421060#dedup
https://www-304.ibm.com/support/docv...id=swg21419733
Best regards,
Kolli
-
The Following User Says Thank You to kolli For This Useful Post:
csops (10-12-2011)
-
10-12-2011, 05:18 AM #5Member
- Join Date
- May 2009
- Posts
- 42
- Thanks
- 2
- Thanked 0 Times in 0 Posts
Thanks for the advice.
Eventually the process changed to Idle and the number of deduped files was considerably larger than the number of files in the Storage Pool.
Similar Threads
-
Identify Duplicates go idle
By staham in forum TSM OperationReplies: 1Last Post: 09-01-2010, 05:12 AM -
Cleanup duplicates in ACTIVEDATAPOOL's
By Bekar in forum TSM ServerReplies: 3Last Post: 07-17-2009, 09:44 AM -
Month end and Year end Schedule.
By pmishra in forum Backup / Archive DiscussionReplies: 1Last Post: 10-01-2007, 11:10 AM -
Process end result?
By cheffern in forum OthersReplies: 9Last Post: 05-08-2007, 12:36 AM -
end a process
By caleb99 in forum Tape / Media LibraryReplies: 1Last Post: 12-21-2004, 12:53 PM


Reply With Quote
