Identify Duplicates not finding any duplicate extents.

Colin

ADSM.ORG Member
Joined
Aug 11, 2016
Messages
43
Reaction score
0
Points
0
PREDATAR Control23

Hi All,

I have an issue where my ident process does not seem to be finding any duplicates on a device class / stg pool that I am importing our department shares backups node (from another TSM server) to.

I am 100% positive there are duplicates in this data.

All my other stgpools run the identify process and find duplicates and then de-dup the data on the next move of said data.

However this data is going to a new device class (it is disk but it is being used sequentially with 250gb volumes) and a new domain/policy/copygroup/class/stgpool etc. Is there any setting I might be missing that is preventing de-dup from working correctly on these files?


the STG pool has the following:

Deduplicate Data?: Yes
Processes For Identifying Duplicates: 1

The options file has the following:

DedupRequiresBackupNo



The Identify process:

38

Identify Duplicates Storage pool: 18MONFD2. Volume: F:\FILEDISK2\00000423.BFS. State: active. State Date/Time: 09/15/2016 12:41:23. Current Physical File(bytes): 1,195 KB. Total Files Processed: 729,089. Total Duplicate Extents Found: 0. Total Duplicate Bytes Found: 0 bytes.



I will not have enough space on my new devclass is de-dup does not work for this node. Please help.
 
PREDATAR Control23

Is the data compressed or encrypted before backup?
 
PREDATAR Control23

It is coming from a server with an ML6030 tape library. The library handled the tape encryption and does not involve the TSM server itself. The data is not compressed to my knowledge however it might be. Where should I check? If it is being compressed I am pretty sure it was on the client side but I am not sure how to check that.
 
PREDATAR Control23

You would have to check that client. Compressed files don't dedup well. That's because the compression algorithm makes the compressed files look unique once broken down in chunks.

Client side compression should not be used with server-side deduplication (since compressed objects do not deduplicate well). However, client-side compression used in conjunction with client-side deduplication can provide an effective means to further reduce storage pool data.
source: https://www.ibm.com/developerworks/...Tivoli+Storage+Manager/page/Deduplication+FAQ
 
PREDATAR Control23

I don't think the data is getting compressed though. I only see the following in the .opt file on the client:

NODENAME deptshare
TCPSERVERADDRESS <Omitted>
EXCLUDE.BACKUP "*:\microsoft uam volume\...\*"
EXCLUDE.BACKUP "*:\microsoft uam volume\...\*.*"
EXCLUDE.BACKUP "*:\...\EA DATA. SF"
EXCLUDE.BACKUP "*:\IBMBIO.COM"
EXCLUDE.BACKUP "*:\IBMDOS.COM"
EXCLUDE.BACKUP "*:\IO.SYS"
EXCLUDE.BACKUP "*:\...\system32\config\...\*"
EXCLUDE.BACKUP "*:\...\system32\Perflib*.dat"
EXCLUDE.BACKUP "*:\...\system32\dhcp\...\*"
INCLUDE.BACKUP "*:\...\system32\dhcp\backup\...\*"
EXCLUDE.BACKUP "*:\...\system32\dns\...\*"
INCLUDE.BACKUP "*:\...\system32\dns\backup\...\*"
EXCLUDE.BACKUP "*:\Departments\[HASHTAG]#Server[/HASHTAG] Migration Holding Pen\...\*"
EXCLUDE.BACKUP "*:\Shares\Departments\Docfinity\...\*"
EXCLUDE.ARCHIVE "*:\microsoft uam volume\...\*"
EXCLUDE.ARCHIVE "*:\microsoft uam volume\...\*.*"
EXCLUDE.ARCHIVE "*:\...\EA DATA. SF"
EXCLUDE.ARCHIVE "*:\IBMBIO.COM"
EXCLUDE.ARCHIVE "*:\IBMDOS.COM"
EXCLUDE.ARCHIVE "*:\IO.SYS"
EXCLUDE.ARCHIVE "*:\...\system32\config\...\*"
EXCLUDE.ARCHIVE "*:\...\system32\Perflib*.dat"
EXCLUDE.ARCHIVE "*:\...\system32\dhcp\...\*"
INCLUDE.ARCHIVE "*:\...\system32\dhcp\backup\...\*"
EXCLUDE.ARCHIVE "*:\...\system32\dns\...\*"
INCLUDE.ARCHIVE "*:\...\system32\dns\backup\...\*"
EXCLUDE.DIR "*:\System Volume Information"
EXCLUDE.DIR "*:\...\Temporary Internet Files"
EXCLUDE.DIR "*:\Recycled"
EXCLUDE.DIR "*:\Recycler"
EXCLUDE.DIR "*:\$Recycle.Bin"
ERRORLOGRETENTION 30 D
PASSWORDACCESS GENERATE
QUERYSCHEDPERIOD 4
SCHEDLOGRETENTION 30 D
CLUSTERNODE YES
DOMAIN "\\nas\g$"




Any thoughts? Does it have to do with it being from a server clustered nas instead of just a regular node? Also is client compression located somewhere other than the clients .opt file?
 
PREDATAR Control23

Does it have to do with it being from a server clustered nas instead of just a regular node? Also is client compression located somewhere other than the clients .opt file?
It could come from a client option set too. Best to check it using:
dsmc query option comp* -optfile={name of option file used for that backup}

Also, it's important to note that you will get better deduplication reduction if you have a single large storage pool rather than a few storage pool, as data is only deduplicated within a storage pool. So if you have multiple storage pools, you potentially increase duplication.
 
PREDATAR Control23

It could come from a client option set too. Best to check it using:
dsmc query option comp* -optfile={name of option file used for that backup}

Also, it's important to note that you will get better de-duplication reduction if you have a single large storage pool rather than a few storage pool, as data is only de-duplicated within a storage pool. So if you have multiple storage pools, you potentially increase duplication.


Not to get off topic but thanks I didn't know that about de-dup. We use 1-to-1 stgpools to domains/copygroups. If one can have different retention periods on different copy groups and domains in the same stgpool that might be something I should consider then.

I will see if I can check the client using that command. Although since it is running on a clustered solution I am not even really sure how to run the cli client for the node. Both cli clients are on the two hosting servers of the cluster, while the clustered department shares are shuffled between the two servers in an active/passive failover state. Essentially it is a NAS front end to a SAN.
 
PREDATAR Control23

Okay here we go:

tsm> query option
ACTIVATEKEY: YES
AFSBACKUPMNTPNT: YES
ALLOWWILDCARDCH: NO
ARCHMC:
ARCHSYMLINKASFILE: YES
ASNODENAME:
ASRFILESPATH:
ASRMODE: NO
AUTOFSRENAME: PROMPT
AUDITLOGGING: OFF
AUDITLOGNAME: c:\program files\tivoli\tsm\baclient\dsmaudit.log
AUTOMOUNT:
AUTODEPLOY: YES
BACKMC:
BACKUPREGISTRY: YES
CANDIDATESINTERVAL: 1
CASESENSITIVEAWARE: NO
CHANGINGRETRIES: 4
CHECKFORORPHANS: NO
CHECKREPARSECONTENT: NO
CHECKTHRESHOLDS: 5
CLIENTVIEW: STANDARD
CLUSTERDISKSONLY: YES
CLUSTERNODE: YES
COMMMETHOD: TCP/IP
COLLOCATEBYFILESPEC: NO
COMMRESTARTDURATION: 60
COMMRESTARTINTERVAL: 15
COMPRESSALWAYS: YES
COMPRESSION: NO
COMPUTERNAME:
DATACENTER:
DATASTORE:
DATEFORMAT: 1
DEFAULTSERVER:
DFSBACKUPMNTPNT: YES
DIRMC: DEFAULT
DISABLENQR: NO
DISKBUFFSIZE: 32
DISKCACHELOCATION:
DEDUPLICATION: NO
DEDUPCACHEPATH: C:\Program Files\Tivoli\TSM\baclient
DEDUPCACHESIZE: 256
DOMAIN:
\\nas\g$
DOMAIN.IMAGE:
DOMAIN.NAS:
DOMAIN.SNAPSHOT:
DOMAIN.VMFILE:
DOMAIN.VMFULL:
DOMNODE:
DONTLOAD: Unknown
DSMTRACELISTEN: NO
EDITOR: YES
EFSDECRYPT: NO
ENABLE8DOT3NAMESUPPORT: NO
ENABLEARCHIVERETENTIONPROTECTION: NO
ENABLECLIENTENCRYPTKEY: NO
ENABLEDEDUPCACHE: YES
ENABLELANFREE: NO
ENHANCEDAUDITLOGGING: YES
HSMENABLEIMMEDIATEMIGRATE: NO
ENCRYPTIONTYPE: AES128
ENCRYPTKEY: SAVE
ERRORLOGMAX: 0
ERRORLOGNAME: c:\program files\tivoli\tsm\baclient\dsmerror.log
ERRORLOGRETENTION: 30, D
ERRORPROG:
EVENTLOGGING: NO
FAILOVERDISABLED: NO
FASTQUERYBACKUP: YES
FBBRANCH:
FBCLIENTNAME:
FBPOLICYNAME:
FBREPOSLOCATION:
FBSERVER:
FBVOLUMENAME:
FOLLOWSYMBOLIC: NO
FRSPRIMARYRESTORE: NO
GROUPS:
GUITREEVIEWAFTERBACKUP: NO
HOST:
HSMDISABLEAUTOMIGDAEMONS: NO
HSMDISTRIBUTEDRECALL: YES
HSMBACKENDMODE: TSM
HSMEXTOBJIDATTR: NO
HSMGROUPEDMIGRATE: NO
HSMLOGEVENTFLAGS: NONE
HSMLOGFORMAT: TEXT
HSMLOGMAX: 0
HSMLOGNAME: c:\program files\tivoli\tsm\baclient\dsmhsm.log
HSMLOGRETENTION: N
HSMLOGSAMPLEINTERVAL: 3600
HSMMULTISERVER: NO
HSMREPARSETAG: 0
HSMMAXRECALLTAPEDRIVES: 5
ICATPASSWORD:
IMAGE: NO
IMAGEGAPSIZE: 32
INCRTHRESHOLD: 0
JOURNALPIPE: \\.\pipe\jnlSessionMgr
KERNELMESSAGES: YES
LANGUAGE: dscenu.txt
LANFREECOMMMETHOD: Named Pipe
LANFREESHMPORT: 1
LANFREESSL: NO
LANFREETCPPORT: 1500
LANFREETCPSERVERADDRESS: 127.0.0.1
LARGECOMMBUFFERS: NO
MAKESPARSEFILE: YES
MANAGEDSERVICES: WEBCLIENT
MAXCANDPROCS: 5
MAXCMDRETRIES: 2
MAXMIGRATORS: 5
MAXRECALLDAEMONS: 20
MAXRECONCILEPROC: 3
MAXTHRESHOLDPROC: 3
MBOBJREFRESHTHRESH: 50
MBPCTREFRESHTHRESH: 50
MEMORYEFFICIENTBACKUP: NO
MIGRATEENCRYPTKEY: NO
MIGFILEEXPIRATION: 7
MIGRATESERVER:
MINMIGFILESIZE: 0
MINRECALLDAEMONS: 3
METHOD: NONE
NAMEDPIPENAME: \\.\pipe\Server1
NASNODENAME:
NFSTIMEOUT: 0
NODENAME: DEPTSHARE
NOSNAPRESTORE: NO
NUMBERFORMAT: 1
OPTFILE: G:\tsm\dsm.opt
OPTIONFORMAT: STANDARD
OVERLAPRECALL: NO
OFFLOADNODENAME:
PASSWORDACCESS: GENERATE
PASSWORDDIR:
PERFMONTCPSERVERADDRESS:
PERFMONTCPPORT: 5129
PERFMONCOMMTIMEOUT: 30
POSTNSCHEDULECMD:
POSTSCHEDULECMD:
POSTSNAPSHOTCMD:
PRENSCHEDULECMD:
PRESCHEDULECMD:
PRESERVELASTACCESSDATE: NO
PRESNAPSHOTCMD:
PROCESSORUTILIZATION: 0
QUERYSCHEDPERIOD: 4
QUIET/VERBOSE: VERBOSE
QUOTESARELITERAL: NO
RECONCILEINTERVAL: 24
REPLACE: PROMPT
RESETARCHIVEATTRIBUTE: NO
RESOURCEUTILIZATION: 2
RESTORECHECKSTUBACCESS: YES
RESTOREMIGSTATE: YES
RETRYPERIOD: 20
RUNASSERVICE: NO
SCHEDCMDDISABLED: NO
SCHEDLOGMAX: 0
SCHEDLOGNAME: c:\program files\tivoli\tsm\baclient\dsmsched.log
SCHEDLOGRETENTION: 30, D
SCHEDMODE: POLLING
SCHEDRESTRETRDISABLED: NO
SCROLLLINES: 20
SCROLLPROMPT: NO
SERVERNAME: DSMSERV
SESSIONINITIATION: CLIENT
SHMPORT: 1
SHMQUEUENAME: \QUEUES\ADSM\DSMSERV
SKIPACL: NO
SKIPACLUPDATECHECK: NO
SKIPNTPERMISSIONS: NO
SKIPMISSINGSYSWFILES: YES
SKIPNTSECURITYCRC: NO
SNAPSHOTCACHELOCATION:
SNAPSHOTCACHESIZE: 1
SNAPSHOTFSIDLERETRIES: 10
SNAPSHOTFSIDLEWAIT: 2S,50MS; MINSET: 1
SNAPSHOTPOLICY: DEFAULT
SNAPSHOTPROVIDERFS: NONE
SNAPSHOTPROVIDERIMAGE: NONE
SRVOPTSETENCRYPTIONDISABLED: NO
SRVPREPOSTSCHEDDISABLED: NO
SRVPREPOSTSNAPDISABLED: NO
SSL: NO
SSLFIPSMODE: NO
SSLREQUIRED: DEFAULT
STAGINGDIRECTORY:
STREAMSEQ: 0
SUBDIR: NO
SUBFILEBACKUP: NO
SUBFILECACHEPATH:
SUBFILECACHESIZE: 10
SYSTEMSTATEBACKUPMETHOD: PROGRESSIVE
TAPEPROMPT: NO
TCPADMINPORT: 1500
TCPBUFFSIZE: 32768
TCPCLIENTADDRESS:
TCPCLIENTPORT: 1501
TCPNODELAY: YES
TCPPORT: 1500
TCPRECVDELAY: 0
TCPSENDDELAY: 0
TCPSERVERADDRESS: TSM-DATA.STKATE.EDU
TCPSENDBUFFSIZE: -1
TCPWINDOWSIZE: 64512
TESTFLAGS:
HARDLINK
TIMEFORMAT: 1
TRACEFILE:
TRACEFLAGS:
TRACEMAX: 0
TRACESEGSIZE: 0
TXNBYTELIMIT: 25600K
UPDATECTIME: NO
USEDIRECTORY: NO
USERS:
USEUNCNAMES: NO
VIRTUALMOUNTPOINT:
VIRTUALNODENAME: DEPTSHARE
VMBACKDIR:
VMBACKNODELETE: NO
VMBACKUPTYPE: FULLVM
VMBACKVCBTRANSPORT: RETRY
VMCHOST:
VMCPW:
VMCTLMC:
VMCUSER:
VMENABLETEMPLATEBACKUPS: NO
VMFULLTYPE: VSTOR
VMLIMITPERDATASTORE: 0
VMLIMITPERHOST: 0
VMLIST:
VMMAXPARALLEL: 1
VMMC:
VMNAME:
VMPROCESSVMWITHINDEPENDENT: NO
VMPROCESSVMWITHPRDM: NO
VMSKIPCTLCOMPRESSION: INVALID
VMTIMEOUT: 180
VMVSTORTRANSPORT:
VSSALTSTAGINGDIR:
VSSUSESYSTEMPROVIDER: NO
WILDCARDSARELITERAL: NO
tsm>
 
PREDATAR Control23

Looks like compression is set to no so I am still at a loss for why identify duplicates process hasn't found any extents. Any thoughts after looking through that or any other information I can gather?
 
PREDATAR Control23

Hrmm I might be dumb and it may just be still processing files. I will see if we have an issue still tomorrow.
 
PREDATAR Control23

If one can have different retention periods on different copy groups and domains in the same stgpool that might be something I should consider then.
Retention is based on management class, which is part of a domain, not on storage pool. Storage pools don't belong to any domain, they simply sit for anyone to use.

This may be a good time to upgrade to 7.1.7, and convert all your FILE devclass dedup pools to one large Directory Container Pool: https://www.ibm.com/support/knowledgecenter/SSGSG7_7.1.6/srv.admin/t_stgpool_convert.html

Hrmm I might be dumb and it may just be still processing files. I will see if we have an issue still tomorrow.
Check:
Code:
show dedupdeleteinfo
Look at the chunks processed VS chunks waiting in the queue.

May also want to try the Perl script in this technote to get a detailed report of the current dedup stats: http://www-01.ibm.com/support/docview.wss?uid=swg21596944

Keep in mind that in order to find duplicates, duplicates must exists. If you have one node with one filesystem (even it's a really large one), there's a good possibility that most of the files are unique, so once broken down in chunks, they are still unique. Now, once you start getting new versions of the same files, then you start getting duplicates, but unless there's a high percentage of change daily, this still would not account for much. It's probably safe to assume that 80% or more of these files never or rarely change.

You should review the best practices too: https://ibm.biz/BdXx5S
 
PREDATAR Control23

Thanks for the additional info. Turns out the export hadn't actually started mounting tapes and exporting yet, was just for some reason taking a super long time to complete the initial parts of the data export (about 300 mb worth). Thanks for all the information though guys really did help. I do think I will be moving to container pool storage soon but I want to finish this migration first.
 
Top