When will Identify Duplicates process end???

csops

ADSM.ORG Member
Joined
May 5, 2009
Messages
49
Reaction score
0
Points
0
Hi,

I've recently (2 1/2 weeks ago to be precise!!!!) enabled server side Data Deduplication on one of my storage pools and I thought it was due to end this weekend. I ran an SQL query to get the total num of files stored in the storage pool and compared the result with the Total Files Processed in the Identify Duplicates process. It was 18.1million files processed on Friday with roughly 19million total files in the storage pool.

Alas today the Identify Duplicates process is on 21.1million files??? Now I'm confused. When will this process end? Is this a how long is a piece of string scenario?

Can anybody shed some light on this?

Thanks,

Wesley
 
Hi,

identify process normally never ends - it stays there with status "IDLE" - can this be your case?

Harry
 
Hi Harry,

It's still active in my case. I'm trying to figure out why it has processed more files than exist in the Storage Pool.

I've a hunch that the figure reported in the Identify Duplicate Process output is a cumulative of all files over the last few weeks which have been expired + the new files have entered the storage pool which will result in more files than currently exist in the storage pool.
 
Hi Wesley,

I believe your hunch is correct, as your identify process is ongoing. it reports cumulative, as you say, including expired files.

Instead you can schedule the Identify process to run on specific period during the day:
update stgpool <your_stgpool> idendifyprocess=0
define schedule <your_admin_schedule_name> t=a starttime=<time> cmd="identify duplicates <your_stgpool> numprocess=<number> duration=<minutes>" active=yes
(or create script and let the schedule run the script)

Here are some links you can read:
https://www-304.ibm.com/support/docview.wss?uid=swg21421060#dedup
https://www-304.ibm.com/support/docview.wss?uid=swg21419733

Best regards,
Kolli
 
Thanks for the advice.

Eventually the process changed to Idle and the number of deduped files was considerably larger than the number of files in the Storage Pool.
 
Back
Top