1. Forum Rules (PLEASE CLICK HERE TO READ BEFORE POSTING) Click the link to access ADSM.ORG Acceptable Use Policy and forum rules which should be observed when using this website. Violators may be banned from this website. This message will disappear after you have made at least 12 posts. Thank you for your cooperation.

When will Identify Duplicates process end???

Discussion in 'TSM Operation' started by csops, Oct 3, 2011.

  1. csops

    csops New Member

    Joined:
    May 5, 2009
    Messages:
    49
    Likes Received:
    0
    Hi,

    I've recently (2 1/2 weeks ago to be precise!!!!) enabled server side Data Deduplication on one of my storage pools and I thought it was due to end this weekend. I ran an SQL query to get the total num of files stored in the storage pool and compared the result with the Total Files Processed in the Identify Duplicates process. It was 18.1million files processed on Friday with roughly 19million total files in the storage pool.

    Alas today the Identify Duplicates process is on 21.1million files??? Now I'm confused. When will this process end? Is this a how long is a piece of string scenario?

    Can anybody shed some light on this?

    Thanks,

    Wesley
     
  2.  
  3. Harry_Redl

    Harry_Redl Moderator

    Joined:
    Dec 29, 2003
    Messages:
    2,264
    Likes Received:
    135
    Occupation:
    IT Consultant
    Location:
    Czech Republic
    Hi,

    identify process normally never ends - it stays there with status "IDLE" - can this be your case?

    Harry
     
    csops likes this.
  4. csops

    csops New Member

    Joined:
    May 5, 2009
    Messages:
    49
    Likes Received:
    0
    Hi Harry,

    It's still active in my case. I'm trying to figure out why it has processed more files than exist in the Storage Pool.

    I've a hunch that the figure reported in the Identify Duplicate Process output is a cumulative of all files over the last few weeks which have been expired + the new files have entered the storage pool which will result in more files than currently exist in the storage pool.
     
  5. kolli

    kolli New Member

    Joined:
    Sep 20, 2002
    Messages:
    6
    Likes Received:
    2
    Hi Wesley,

    I believe your hunch is correct, as your identify process is ongoing. it reports cumulative, as you say, including expired files.

    Instead you can schedule the Identify process to run on specific period during the day:
    update stgpool <your_stgpool> idendifyprocess=0
    define schedule <your_admin_schedule_name> t=a starttime=<time> cmd="identify duplicates <your_stgpool> numprocess=<number> duration=<minutes>" active=yes
    (or create script and let the schedule run the script)

    Here are some links you can read:
    https://www-304.ibm.com/support/docview.wss?uid=swg21421060#dedup
    https://www-304.ibm.com/support/docview.wss?uid=swg21419733

    Best regards,
    Kolli
     
    csops likes this.
  6. csops

    csops New Member

    Joined:
    May 5, 2009
    Messages:
    49
    Likes Received:
    0
    Thanks for the advice.

    Eventually the process changed to Idle and the number of deduped files was considerably larger than the number of files in the Storage Pool.
     

Share This Page