When will Identify Duplicates process end???

csops · Oct 3, 2011

Hi,

I've recently (2 1/2 weeks ago to be precise!!!!) enabled server side Data Deduplication on one of my storage pools and I thought it was due to end this weekend. I ran an SQL query to get the total num of files stored in the storage pool and compared the result with the Total Files Processed in the Identify Duplicates process. It was 18.1million files processed on Friday with roughly 19million total files in the storage pool.

Alas today the Identify Duplicates process is on 21.1million files??? Now I'm confused. When will this process end? Is this a how long is a piece of string scenario?

Can anybody shed some light on this?

Thanks,

Wesley

Harry_Redl · Oct 3, 2011

Hi,

identify process normally never ends - it stays there with status "IDLE" - can this be your case?

Harry

csops · Oct 3, 2011

Hi Harry,

It's still active in my case. I'm trying to figure out why it has processed more files than exist in the Storage Pool.

I've a hunch that the figure reported in the Identify Duplicate Process output is a cumulative of all files over the last few weeks which have been expired + the new files have entered the storage pool which will result in more files than currently exist in the storage pool.

kolli · Oct 3, 2011

Hi Wesley,

I believe your hunch is correct, as your identify process is ongoing. it reports cumulative, as you say, including expired files.

Instead you can schedule the Identify process to run on specific period during the day:
update stgpool <your_stgpool> idendifyprocess=0
define schedule <your_admin_schedule_name> t=a starttime=<time> cmd="identify duplicates <your_stgpool> numprocess=<number> duration=<minutes>" active=yes
(or create script and let the schedule run the script)

Here are some links you can read:
https://www-304.ibm.com/support/docview.wss?uid=swg21421060#dedup
https://www-304.ibm.com/support/docview.wss?uid=swg21419733

Best regards,
Kolli

csops · Oct 12, 2011

Thanks for the advice.

Eventually the process changed to Idle and the number of deduped files was considerably larger than the number of files in the Storage Pool.

When will Identify Duplicates process end???

csops

Harry_Redl

Moderator

csops

kolli

csops

Data Privacy Impact Assessment

Sponsor ADSM.ORG

Navigation Menu

NordVPN 3 Months FREE

Forum statistics