Results 1 to 5 of 5
  1. #1
    Member
    Join Date
    May 2009
    Posts
    49
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Default When will Identify Duplicates process end???

    Hi,

    I've recently (2 1/2 weeks ago to be precise!!!!) enabled server side Data Deduplication on one of my storage pools and I thought it was due to end this weekend. I ran an SQL query to get the total num of files stored in the storage pool and compared the result with the Total Files Processed in the Identify Duplicates process. It was 18.1million files processed on Friday with roughly 19million total files in the storage pool.

    Alas today the Identify Duplicates process is on 21.1million files??? Now I'm confused. When will this process end? Is this a how long is a piece of string scenario?

    Can anybody shed some light on this?

    Thanks,

    Wesley

  2. #2
    Moderator Harry_Redl's Avatar
    Join Date
    Dec 2003
    Location
    Czech Republic
    Posts
    2,220
    Thanks
    3
    Thanked 100 Times in 99 Posts

    Default

    Hi,

    identify process normally never ends - it stays there with status "IDLE" - can this be your case?

    Harry

  3. The Following User Says Thank You to Harry_Redl For This Useful Post:

    csops (10-12-2011)

  4. #3
    Member
    Join Date
    May 2009
    Posts
    49
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Default

    Hi Harry,

    It's still active in my case. I'm trying to figure out why it has processed more files than exist in the Storage Pool.

    I've a hunch that the figure reported in the Identify Duplicate Process output is a cumulative of all files over the last few weeks which have been expired + the new files have entered the storage pool which will result in more files than currently exist in the storage pool.

  5. #4
    Member
    Join Date
    Sep 2002
    Posts
    6
    Thanks
    0
    Thanked 2 Times in 2 Posts

    Default

    Hi Wesley,

    I believe your hunch is correct, as your identify process is ongoing. it reports cumulative, as you say, including expired files.

    Instead you can schedule the Identify process to run on specific period during the day:
    update stgpool <your_stgpool> idendifyprocess=0
    define schedule <your_admin_schedule_name> t=a starttime=<time> cmd="identify duplicates <your_stgpool> numprocess=<number> duration=<minutes>" active=yes
    (or create script and let the schedule run the script)

    Here are some links you can read:
    https://www-304.ibm.com/support/docv...21421060#dedup
    https://www-304.ibm.com/support/docv...id=swg21419733

    Best regards,
    Kolli

  6. The Following User Says Thank You to kolli For This Useful Post:

    csops (10-12-2011)

  7. #5
    Member
    Join Date
    May 2009
    Posts
    49
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Default

    Thanks for the advice.

    Eventually the process changed to Idle and the number of deduped files was considerably larger than the number of files in the Storage Pool.

Similar Threads

  1. Identify Duplicates go idle
    By staham in forum TSM Operation
    Replies: 1
    Last Post: 09-01-2010, 05:12 AM
  2. Cleanup duplicates in ACTIVEDATAPOOL's
    By Bekar in forum TSM Server
    Replies: 3
    Last Post: 07-17-2009, 09:44 AM
  3. Month end and Year end Schedule.
    By pmishra in forum Backup / Archive Discussion
    Replies: 1
    Last Post: 10-01-2007, 11:10 AM
  4. Process end result?
    By cheffern in forum Others
    Replies: 9
    Last Post: 05-08-2007, 12:36 AM
  5. end a process
    By caleb99 in forum Tape / Media Library
    Replies: 1
    Last Post: 12-21-2004, 12:53 PM

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •