Identify Duplicates process keeps running longer than set in duration

navion

ADSM.ORG Member
Joined
May 30, 2014
Messages
10
Reaction score
0
Points
0
Hello chaps.
I have configured TSM using tsmconfig.pl from the Blueprint, deduplication process should run from 21:00 to 9:00 within backup window:
Code:
q sched t=a
[TABLE]
[TR="class: cmdTableHeader"]
[TH]*[/TH]
[TH]Schedule Name[/TH]
[TH]Start Date/Time[/TH]
[TH]Duration[/TH]
[TH]Period[/TH]
[TH]Day[/TH]
[/TR]
[TR]
[TD][/TD]
[TD]DBBACKUP
[/TD]
[TD]2014-04-25, 10:00:00[/TD]
[TD]15 M [/TD]
[TD]1 D [/TD]
[TD]Any
[/TD]
[/TR]
[TR]
[TD][/TD]
[TD]DEDUPLICATE
[/TD]
[TD]2014-04-25, 21:00:00[/TD]
[TD]15 M [/TD]
[TD]1 D [/TD]
[TD]Any[/TD]
[/TR]
[TR]
[TD][/TD]
[TD]EXPIRE
[/TD]
[TD]2014-04-25, 07:00:00[/TD]
[TD]15 M [/TD]
[TD]1 D [/TD]
[TD]Any[/TD]
[/TR]
[TR]
[TD][/TD]
[TD]RECLAIM[/TD]
[TD]2014-04-25, 11:00:00[/TD]
[TD]15 M [/TD]
[TD]1 D [/TD]
[TD]Any[/TD]
[/TR]
[/TABLE]

q script deduplicate f=d[TABLE]
[TR="class: cmdTableHeader"]
[TH]Name
[/TH]
[TH]Line Number
[/TH]
[TH]Command[/TH]
[TH]Last Update by (administrator)[/TH]
[TH]Last Update Date/Time[/TH]
[/TR]
[TR]
[TD]DEDUPLICATE[/TD]
[TD]Description[/TD]
[TD]Run identify duplicate processes.[/TD]
[TD]ADMIN[/TD]
[TD]2014-04-25, 14:53:53[/TD]
[/TR]
[TR]
[TD]DEDUPLICATE
[/TD]
[TD]10[/TD]
[TD]identify duplicates DEDUPPOOL numprocess=12 duration=720[/TD]
[TD]ADMIN[/TD]
[TD]2014-05-30, 12:26:33[/TD]
[/TR]
[/TABLE]
At 13:00 EXPIRE, DBBACKUP and RECLAIM already completed, but all of the identify processes still running. Does anywone know how to restrict identify run time?

Please excuse my bad English, I hope you can understand what I want to say.
 
How many identify processes is your storage pool setup for? Those will run 24/7. If you want to control when to run it, then update the stgpool to have 0 identify processes, and keep your current schedule to start X number of processes as needed.

NUMPRocessSpecifies the number of duplicate-identification processes to run after the command completes. You can specify 0 - 50 processes. The value that you specify for this parameter overrides the value that you specified in the storage pool definition or the most recent value that was specified when you last issued this command. If you specify zero, all duplicate-identification processes stop.This parameter is optional. If you do not specify a value, the Tivoli Storage Manager server starts or stops duplicate-identification processes so that the number of processes is the same as the number that is specified in the storage pool definition.

For example, suppose that you define a new storage pool and specify two duplicate-identification processes. Later, you issue the IDENTIFY DUPLICATES command to increase the number of processes to four. When you issue the IDENTIFY DUPLICATES command again without specifying a value for the NUMPROCESS parameter, the server stops two duplicate-identification processes.

If you specified 0 processes when you defined the storage pool definition and you issue IDENTIFY DUPLICATES without specifying a value for NUMPROCESS, any running duplicate-identification processes stop, and the server does not start any new processes.

Remember:
When you issue IDENTIFY DUPLICATES without specifying a value for NUMPROCESS, the DURATION parameter is not available. Duplicate-identification processes specified in the storage pool definition run indefinitely, or until you reissue the IDENTIFY DUPLICATES command, update the storage pool definition, or cancel a process.

When the server stops a duplicate-identification process, the process completes the current physical file and then stops. As a result, it might take several minutes to reach the number of duplicate-identification processes that you specified as a value for this parameter.

More info:
http://www-01.ibm.com/support/knowl...ef.doc/r_cmd_duplicates_identify.html?lang=en
 
How many identify processes is your storage pool setup for?
All stgpool maintenance already disabled.
Code:
                    Storage Pool Name: DEDUPPOOL
                    Storage Pool Type: Primary
                    Device Class Name: FILEDEV
                   Estimated Capacity: 48,353.94 G
                   Space Trigger Util: 93.55
                             Pct Util: 46.247
                             Pct Migr: 46.247
                          Pct Logical: 99.364
                         High Mig Pct: 90
                          Low Mig Pct: 70
                      Migration Delay: 0
                   Migration Continue: Yes
                  Migration Processes: 1
                Reclamation Processes: 10
                    Next Storage Pool: 
                 Reclaim Storage Pool: 
               Maximum Size Threshold: No Limit
                               Access: Read/Write
                          Description: Deduplicated disk storage
                    Overflow Location: 
                Cache Migrated Files?: 
                           Collocate?: Group
                Reclamation Threshold: 100
            Offsite Reclamation Limit: 
      Maximum Scratch Volumes Allowed: 968
       Number of Scratch Volumes Used: 479
        Delay Period for Volume Reuse: 0 Day(s)
               Migration in Progress?: No
                 Amount Migrated (MB): 0
     Elapsed Migration Time (seconds): 0
             Reclamation in Progress?: No
       Last Update by (administrator): ADMIN
                Last Update Date/Time: 2014-04-29, 23:35:25
             Storage Pool Data Format: Native
                 Copy Storage Pool(s): 
                  Active Data Pool(s): 
              Continue Copy on Error?: Yes
                             CRC Data: No
                     Reclamation Type: Threshold
          Overwrite Data when Deleted: 
                    Deduplicate Data?: Yes
 Processes For Identifying Duplicates: 0
            Duplicate Data Not Stored: 240,528 M ( 1%)
                       Auto-copy Mode: Client
Contains Data Deduplicated by Client?: No
 
One way to possibly get around this issue is to remove the Identify Duplicates from the storage pool control and set it up as an administrative scheduled with a limited time period for running. Give it a try and let us know how that goes.
 
It is already done by the tsmconfig.pl script:

identify duplicates DEDUPPOOL numprocess=12 duration=720
 
Looks like I have some issue with deduplication performance, because backup, migration or deletion of deduplicated storage pool took ages to complete.
 
Did you mange to solve this one?
I am having the same issue.
 
Most of TSM processes when cancelled either because the duration has passed or by the cancel command will normally finish to process the current object. If that object is really large, then it can run several minutes longer than the duration.

You can see that often when you cancel a process, sometimes it takes several minutes before the process actually ends, same thing applies when the duration is reached.
 
Hi, marcland, I am familiar with this.
In my case it is running for more than 7 hours!
 
7 hours is long. Are you at the latest fixpack for your current version, because there are a few APARs for dedup and identify? If not, should be the first step, why try to troubleshoot a solved problem.
 
I have 7.1.1.100 TSM server (latest ver of 7.1.1).
What first step?
 
Back
Top