Identify Duplicates not finding any duplicate extents.

Colin · Sep 15, 2016

Hi All,

I have an issue where my ident process does not seem to be finding any duplicates on a device class / stg pool that I am importing our department shares backups node (from another TSM server) to.

I am 100% positive there are duplicates in this data.

All my other stgpools run the identify process and find duplicates and then de-dup the data on the next move of said data.

However this data is going to a new device class (it is disk but it is being used sequentially with 250gb volumes) and a new domain/policy/copygroup/class/stgpool etc. Is there any setting I might be missing that is preventing de-dup from working correctly on these files?

the STG pool has the following:

Deduplicate Data?: Yes
Processes For Identifying Duplicates: 1

The options file has the following:

DedupRequiresBackupNo

The Identify process:

38

Identify Duplicates Storage pool: 18MONFD2. Volume: F:\FILEDISK2\00000423.BFS. State: active. State Date/Time: 09/15/2016 12:41:23. Current Physical File(bytes): 1,195 KB. Total Files Processed: 729,089. Total Duplicate Extents Found: 0. Total Duplicate Bytes Found: 0 bytes.

I will not have enough space on my new devclass is de-dup does not work for this node. Please help.

marclant · Sep 15, 2016

Is the data compressed or encrypted before backup?

Colin · Sep 15, 2016

It is coming from a server with an ML6030 tape library. The library handled the tape encryption and does not involve the TSM server itself. The data is not compressed to my knowledge however it might be. Where should I check? If it is being compressed I am pretty sure it was on the client side but I am not sure how to check that.

marclant · Sep 15, 2016

You would have to check that client. Compressed files don't dedup well. That's because the compression algorithm makes the compressed files look unique once broken down in chunks.

Client side compression should not be used with server-side deduplication (since compressed objects do not deduplicate well). However, client-side compression used in conjunction with client-side deduplication can provide an effective means to further reduce storage pool data.

source: https://www.ibm.com/developerworks/...Tivoli+Storage+Manager/page/Deduplication+FAQ

Colin · Sep 15, 2016

I don't think the data is getting compressed though. I only see the following in the .opt file on the client:

NODENAME deptshare
TCPSERVERADDRESS <Omitted>
EXCLUDE.BACKUP "*:\microsoft uam volume\...\*"
EXCLUDE.BACKUP "*:\microsoft uam volume\...\*.*"
EXCLUDE.BACKUP "*:\...\EA DATA. SF"
EXCLUDE.BACKUP "*:\IBMBIO.COM"
EXCLUDE.BACKUP "*:\IBMDOS.COM"
EXCLUDE.BACKUP "*:\IO.SYS"
EXCLUDE.BACKUP "*:\...\system32\config\...\*"
EXCLUDE.BACKUP "*:\...\system32\Perflib*.dat"
EXCLUDE.BACKUP "*:\...\system32\dhcp\...\*"
INCLUDE.BACKUP "*:\...\system32\dhcp\backup\...\*"
EXCLUDE.BACKUP "*:\...\system32\dns\...\*"
INCLUDE.BACKUP "*:\...\system32\dns\backup\...\*"
EXCLUDE.BACKUP "*:\Departments\[HASHTAG]#Server[/HASHTAG] Migration Holding Pen\...\*"
EXCLUDE.BACKUP "*:\Shares\Departments\Docfinity\...\*"
EXCLUDE.ARCHIVE "*:\microsoft uam volume\...\*"
EXCLUDE.ARCHIVE "*:\microsoft uam volume\...\*.*"
EXCLUDE.ARCHIVE "*:\...\EA DATA. SF"
EXCLUDE.ARCHIVE "*:\IBMBIO.COM"
EXCLUDE.ARCHIVE "*:\IBMDOS.COM"
EXCLUDE.ARCHIVE "*:\IO.SYS"
EXCLUDE.ARCHIVE "*:\...\system32\config\...\*"
EXCLUDE.ARCHIVE "*:\...\system32\Perflib*.dat"
EXCLUDE.ARCHIVE "*:\...\system32\dhcp\...\*"
INCLUDE.ARCHIVE "*:\...\system32\dhcp\backup\...\*"
EXCLUDE.ARCHIVE "*:\...\system32\dns\...\*"
INCLUDE.ARCHIVE "*:\...\system32\dns\backup\...\*"
EXCLUDE.DIR "*:\System Volume Information"
EXCLUDE.DIR "*:\...\Temporary Internet Files"
EXCLUDE.DIR "*:\Recycled"
EXCLUDE.DIR "*:\Recycler"
EXCLUDE.DIR "*:\$Recycle.Bin"
ERRORLOGRETENTION 30 D
PASSWORDACCESS GENERATE
QUERYSCHEDPERIOD 4
SCHEDLOGRETENTION 30 D
CLUSTERNODE YES
DOMAIN "\\nas\g$"

Any thoughts? Does it have to do with it being from a server clustered nas instead of just a regular node? Also is client compression located somewhere other than the clients .opt file?

marclant · Sep 15, 2016

Colin said:
Does it have to do with it being from a server clustered nas instead of just a regular node? Also is client compression located somewhere other than the clients .opt file?

It could come from a client option set too. Best to check it using:
dsmc query option comp* -optfile={name of option file used for that backup}

Also, it's important to note that you will get better deduplication reduction if you have a single large storage pool rather than a few storage pool, as data is only deduplicated within a storage pool. So if you have multiple storage pools, you potentially increase duplication.

Colin · Sep 15, 2016

marclant said:
It could come from a client option set too. Best to check it using:
dsmc query option comp* -optfile={name of option file used for that backup}

Also, it's important to note that you will get better de-duplication reduction if you have a single large storage pool rather than a few storage pool, as data is only de-duplicated within a storage pool. So if you have multiple storage pools, you potentially increase duplication.

Not to get off topic but thanks I didn't know that about de-dup. We use 1-to-1 stgpools to domains/copygroups. If one can have different retention periods on different copy groups and domains in the same stgpool that might be something I should consider then.

I will see if I can check the client using that command. Although since it is running on a clustered solution I am not even really sure how to run the cli client for the node. Both cli clients are on the two hosting servers of the cluster, while the clustered department shares are shuffled between the two servers in an active/passive failover state. Essentially it is a NAS front end to a SAN.

Colin · Sep 15, 2016

Okay here we go:

tsm> query option
ACTIVATEKEY: YES
AFSBACKUPMNTPNT: YES
ALLOWWILDCARDCH: NO
ARCHMC:
ARCHSYMLINKASFILE: YES
ASNODENAME:
ASRFILESPATH:
ASRMODE: NO
AUTOFSRENAME: PROMPT
AUDITLOGGING: OFF
AUDITLOGNAME: c:\program files\tivoli\tsm\baclient\dsmaudit.log
AUTOMOUNT:
AUTODEPLOY: YES
BACKMC:
BACKUPREGISTRY: YES
CANDIDATESINTERVAL: 1
CASESENSITIVEAWARE: NO
CHANGINGRETRIES: 4
CHECKFORORPHANS: NO
CHECKREPARSECONTENT: NO
CHECKTHRESHOLDS: 5
CLIENTVIEW: STANDARD
CLUSTERDISKSONLY: YES
CLUSTERNODE: YES
COMMMETHOD: TCP/IP
COLLOCATEBYFILESPEC: NO
COMMRESTARTDURATION: 60
COMMRESTARTINTERVAL: 15
COMPRESSALWAYS: YES
COMPRESSION: NO
COMPUTERNAME:
DATACENTER:
DATASTORE:
DATEFORMAT: 1
DEFAULTSERVER:
DFSBACKUPMNTPNT: YES
DIRMC: DEFAULT
DISABLENQR: NO
DISKBUFFSIZE: 32
DISKCACHELOCATION:
DEDUPLICATION: NO
DEDUPCACHEPATH: C:\Program Files\Tivoli\TSM\baclient
DEDUPCACHESIZE: 256
DOMAIN:
\\nas\g$
DOMAIN.IMAGE:
DOMAIN.NAS:
DOMAIN.SNAPSHOT:
DOMAIN.VMFILE:
DOMAIN.VMFULL:
DOMNODE:
DONTLOAD: Unknown
DSMTRACELISTEN: NO
EDITOR: YES
EFSDECRYPT: NO
ENABLE8DOT3NAMESUPPORT: NO
ENABLEARCHIVERETENTIONPROTECTION: NO
ENABLECLIENTENCRYPTKEY: NO
ENABLEDEDUPCACHE: YES
ENABLELANFREE: NO
ENHANCEDAUDITLOGGING: YES
HSMENABLEIMMEDIATEMIGRATE: NO
ENCRYPTIONTYPE: AES128
ENCRYPTKEY: SAVE
ERRORLOGMAX: 0
ERRORLOGNAME: c:\program files\tivoli\tsm\baclient\dsmerror.log
ERRORLOGRETENTION: 30, D
ERRORPROG:
EVENTLOGGING: NO
FAILOVERDISABLED: NO
FASTQUERYBACKUP: YES
FBBRANCH:
FBCLIENTNAME:
FBPOLICYNAME:
FBREPOSLOCATION:
FBSERVER:
FBVOLUMENAME:
FOLLOWSYMBOLIC: NO
FRSPRIMARYRESTORE: NO
GROUPS:
GUITREEVIEWAFTERBACKUP: NO
HOST:
HSMDISABLEAUTOMIGDAEMONS: NO
HSMDISTRIBUTEDRECALL: YES
HSMBACKENDMODE: TSM
HSMEXTOBJIDATTR: NO
HSMGROUPEDMIGRATE: NO
HSMLOGEVENTFLAGS: NONE
HSMLOGFORMAT: TEXT
HSMLOGMAX: 0
HSMLOGNAME: c:\program files\tivoli\tsm\baclient\dsmhsm.log
HSMLOGRETENTION: N
HSMLOGSAMPLEINTERVAL: 3600
HSMMULTISERVER: NO
HSMREPARSETAG: 0
HSMMAXRECALLTAPEDRIVES: 5
ICATPASSWORD:
IMAGE: NO
IMAGEGAPSIZE: 32
INCRTHRESHOLD: 0
JOURNALPIPE: \\.\pipe\jnlSessionMgr
KERNELMESSAGES: YES
LANGUAGE: dscenu.txt
LANFREECOMMMETHOD: Named Pipe
LANFREESHMPORT: 1
LANFREESSL: NO
LANFREETCPPORT: 1500
LANFREETCPSERVERADDRESS: 127.0.0.1
LARGECOMMBUFFERS: NO
MAKESPARSEFILE: YES
MANAGEDSERVICES: WEBCLIENT
MAXCANDPROCS: 5
MAXCMDRETRIES: 2
MAXMIGRATORS: 5
MAXRECALLDAEMONS: 20
MAXRECONCILEPROC: 3
MAXTHRESHOLDPROC: 3
MBOBJREFRESHTHRESH: 50
MBPCTREFRESHTHRESH: 50
MEMORYEFFICIENTBACKUP: NO
MIGRATEENCRYPTKEY: NO
MIGFILEEXPIRATION: 7
MIGRATESERVER:
MINMIGFILESIZE: 0
MINRECALLDAEMONS: 3
METHOD: NONE
NAMEDPIPENAME: \\.\pipe\Server1
NASNODENAME:
NFSTIMEOUT: 0
NODENAME: DEPTSHARE
NOSNAPRESTORE: NO
NUMBERFORMAT: 1
OPTFILE: G:\tsm\dsm.opt
OPTIONFORMAT: STANDARD
OVERLAPRECALL: NO
OFFLOADNODENAME:
PASSWORDACCESS: GENERATE
PASSWORDDIR:
PERFMONTCPSERVERADDRESS:
PERFMONTCPPORT: 5129
PERFMONCOMMTIMEOUT: 30
POSTNSCHEDULECMD:
POSTSCHEDULECMD:
POSTSNAPSHOTCMD:
PRENSCHEDULECMD:
PRESCHEDULECMD:
PRESERVELASTACCESSDATE: NO
PRESNAPSHOTCMD:
PROCESSORUTILIZATION: 0
QUERYSCHEDPERIOD: 4
QUIET/VERBOSE: VERBOSE
QUOTESARELITERAL: NO
RECONCILEINTERVAL: 24
REPLACE: PROMPT
RESETARCHIVEATTRIBUTE: NO
RESOURCEUTILIZATION: 2
RESTORECHECKSTUBACCESS: YES
RESTOREMIGSTATE: YES
RETRYPERIOD: 20
RUNASSERVICE: NO
SCHEDCMDDISABLED: NO
SCHEDLOGMAX: 0
SCHEDLOGNAME: c:\program files\tivoli\tsm\baclient\dsmsched.log
SCHEDLOGRETENTION: 30, D
SCHEDMODE: POLLING
SCHEDRESTRETRDISABLED: NO
SCROLLLINES: 20
SCROLLPROMPT: NO
SERVERNAME: DSMSERV
SESSIONINITIATION: CLIENT
SHMPORT: 1
SHMQUEUENAME: \QUEUES\ADSM\DSMSERV
SKIPACL: NO
SKIPACLUPDATECHECK: NO
SKIPNTPERMISSIONS: NO
SKIPMISSINGSYSWFILES: YES
SKIPNTSECURITYCRC: NO
SNAPSHOTCACHELOCATION:
SNAPSHOTCACHESIZE: 1
SNAPSHOTFSIDLERETRIES: 10
SNAPSHOTFSIDLEWAIT: 2S,50MS; MINSET: 1
SNAPSHOTPOLICY: DEFAULT
SNAPSHOTPROVIDERFS: NONE
SNAPSHOTPROVIDERIMAGE: NONE
SRVOPTSETENCRYPTIONDISABLED: NO
SRVPREPOSTSCHEDDISABLED: NO
SRVPREPOSTSNAPDISABLED: NO
SSL: NO
SSLFIPSMODE: NO
SSLREQUIRED: DEFAULT
STAGINGDIRECTORY:
STREAMSEQ: 0
SUBDIR: NO
SUBFILEBACKUP: NO
SUBFILECACHEPATH:
SUBFILECACHESIZE: 10
SYSTEMSTATEBACKUPMETHOD: PROGRESSIVE
TAPEPROMPT: NO
TCPADMINPORT: 1500
TCPBUFFSIZE: 32768
TCPCLIENTADDRESS:
TCPCLIENTPORT: 1501
TCPNODELAY: YES
TCPPORT: 1500
TCPRECVDELAY: 0
TCPSENDDELAY: 0
TCPSERVERADDRESS: TSM-DATA.STKATE.EDU
TCPSENDBUFFSIZE: -1
TCPWINDOWSIZE: 64512
TESTFLAGS:
HARDLINK
TIMEFORMAT: 1
TRACEFILE:
TRACEFLAGS:
TRACEMAX: 0
TRACESEGSIZE: 0
TXNBYTELIMIT: 25600K
UPDATECTIME: NO
USEDIRECTORY: NO
USERS:
USEUNCNAMES: NO
VIRTUALMOUNTPOINT:
VIRTUALNODENAME: DEPTSHARE
VMBACKDIR:
VMBACKNODELETE: NO
VMBACKUPTYPE: FULLVM
VMBACKVCBTRANSPORT: RETRY
VMCHOST:
VMCPW:
VMCTLMC:
VMCUSER:
VMENABLETEMPLATEBACKUPS: NO
VMFULLTYPE: VSTOR
VMLIMITPERDATASTORE: 0
VMLIMITPERHOST: 0
VMLIST:
VMMAXPARALLEL: 1
VMMC:
VMNAME:
VMPROCESSVMWITHINDEPENDENT: NO
VMPROCESSVMWITHPRDM: NO
VMSKIPCTLCOMPRESSION: INVALID
VMTIMEOUT: 180
VMVSTORTRANSPORT:
VSSALTSTAGINGDIR:
VSSUSESYSTEMPROVIDER: NO
WILDCARDSARELITERAL: NO
tsm>

Colin · Sep 15, 2016

Looks like compression is set to no so I am still at a loss for why identify duplicates process hasn't found any extents. Any thoughts after looking through that or any other information I can gather?

Colin · Sep 15, 2016

Hrmm I might be dumb and it may just be still processing files. I will see if we have an issue still tomorrow.

marclant · Sep 16, 2016

Colin said:
If one can have different retention periods on different copy groups and domains in the same stgpool that might be something I should consider then.

Retention is based on management class, which is part of a domain, not on storage pool. Storage pools don't belong to any domain, they simply sit for anyone to use.

This may be a good time to upgrade to 7.1.7, and convert all your FILE devclass dedup pools to one large Directory Container Pool: https://www.ibm.com/support/knowledgecenter/SSGSG7_7.1.6/srv.admin/t_stgpool_convert.html

Colin said:
Hrmm I might be dumb and it may just be still processing files. I will see if we have an issue still tomorrow.

Check:

Code:

show dedupdeleteinfo

Look at the chunks processed VS chunks waiting in the queue.

May also want to try the Perl script in this technote to get a detailed report of the current dedup stats: http://www-01.ibm.com/support/docview.wss?uid=swg21596944

Keep in mind that in order to find duplicates, duplicates must exists. If you have one node with one filesystem (even it's a really large one), there's a good possibility that most of the files are unique, so once broken down in chunks, they are still unique. Now, once you start getting new versions of the same files, then you start getting duplicates, but unless there's a high percentage of change daily, this still would not account for much. It's probably safe to assume that 80% or more of these files never or rarely change.

You should review the best practices too: https://ibm.biz/BdXx5S

Colin · Sep 17, 2016

Thanks for the additional info. Turns out the export hadn't actually started mounting tapes and exporting yet, was just for some reason taking a super long time to complete the initial parts of the data export (about 300 mb worth). Thanks for all the information though guys really did help. I do think I will be moving to container pool storage soon but I want to finish this migration first.

Identify Duplicates not finding any duplicate extents.

Colin

marclant

Colin

marclant

Colin

marclant

Colin

Colin

Colin

Colin

marclant

Colin

Data Privacy Impact Assessment

Sponsor ADSM.ORG

Navigation Menu

NordVPN 3 Months FREE

Forum statistics