TSM and Windows Server 2012 Deduplication

FloXXI

ADSM.ORG Member
Joined
Jul 25, 2012
Messages
19
Reaction score
0
Points
0
Hi,

we are running TSM Server and Client version 7.1. Recently I updated our Fileserver from Windows Server 2003 to Server 2012. I had a look at the Windows Server 2012 deduplication feature and found it worth a try, becacuse the evaluation tool said, we could save about 35 % filespace. Usually our daily incremental backup transfers about 50 - 100 GB of Data to the TSM server. I activated the deduplicaion 2 days ago and the incremental backup transfered between 1 - 2 TB the last two days. :confused:

I can not find any information from IBM about how TSM works with Windows Server 2012 deduplication. But in Microsoft TechNet you can find the following information on deduplication which I think is fitting for TSM:

In non-optimized backup and restore, the backup application does not use the Data Deduplication backup and restore API. Instead, the backup application opens the files and copies them without specifying the reparse point flag.

The optimized files are coped to the backup volume as normal files, not as optimized files. The conversion from optimized files to normal files is performed transparently in memory by Data Deduplication when the backup application copies the files. Restoring from such a backup store is a normal file-copy operation.

http://technet.microsoft.com/de-de/library/hh831600.aspx

Does anyone know why TSM is transferring such a large amount of data after activating deduplication? Is it cause the background job for deduplication in windows makes changes to the files? Will it decrease after the job has finished the first round?

Thank You!
 
I don't believe the TSM 7.1 Windows client is Windows Deduplication aware (at this stage anyway). I can't find much information myself, sorry about that.
 
I opened a request at ibm and the answer was quick an easy: TSM 7.1 does not support Microsofts Data Deduplication backup and restore API, hence optimized backup is not possible. It is known that backups on deduplicated volumes take longer. Support for that feature is not announced yet.

I still don't know why the amount of data transfered with each incremental backup has grown.
 
You could review what is being backed up, or turn on auditlogging. Is it attempting to backup the data dedup metadata stored on Windows?
 
It's likely rehydrating the data while backing it up. Exactly the same as if you copy the file from the dedup drive to another drive that is not deduped or another computer.
 
The problem is that after activating deduplication TSM backs up files that were not modified by users. You can see it in the picture.

attachment.php


Microsoft says that deduplication can process about 2 TB of data on one volume per day. That is about what I see in my daily incremental. I will have to see if this gets better after deduplication processed all data for the first time.
 

Attachments

  • tsm.png
    tsm.png
    37.9 KB · Views: 85
That makes sense. TSM looks more than just the timestamp. Here's what it compares:
Code:
***** A t r i b T r a c e *****
 fioCmpAttribs(): ignoring attributes: '--N A -------- -- -- -- (0x000000a0)'
 fioCmpAttribs(): File Size unchanged since backup
 fioCmpAttribs(): Compared FILETIME formated time stamps.
 fioCmpAttribs(): File Date/Time unchanged since backup
 fioCmpAttribs(): File Attributes unchanged since backup
 fioCmpAttribs(): File MSDfs unchanged since backup
 fioCmpAttribs(): File Security unchanged since backup
 fioCmpAttribs(): File Share unchanged since backup
 fioCmpAttribs(): File Security CRC unchanged since backup
 fioCmpAttribs(): Named Stream Size unchanged since backup
 fioCmpAttribs(): Backup Stream Size unchanged since backup
 fioCmpAttribs(): old attrib's data from build (IBM TSM 5.5.1.0)
 ***** A t r i b T r a c e E n d *****
 fioCmpAttribs(): returning ATTRIBS_EQUAL

The example above is for a file that has not changed, but if just one of those attributes changes, the file is a candidate for backup.
 
I imagine that this should settle down after the initial dedup pass is done.

The downside of this too, is if your dedup volume gets to a point where there is more data than the physical size of the disk, in the event of a DR, you will not be able to restore all the data at once. You will have to restore until the volume is almost full, wait for dedup to free up space, then restore more data. Might have to repeat a few times.
 
How can I get that output of changed file properties?

In case of a DR, it is a good time to get a bigger and better storage system anyway. ;-)
 
Thank You, marclant!

I did some tests and looked at the output.


  • I created a new volume with some identical files.
  • I backed them up using tsm.
  • I activated deduplication and started a dedup job.
  • After that I did an incremental backup again

Here is what happened to the files:

Code:
 ***** A t r i b  T r a c e *****
 fioCmpAttribs(): ignoring attributes: '--N A -------- -- -- --  (0x000000a0)'
 fioCmpAttribs(): File Size unchanged since backup
 fioCmpAttribs(): Compared FILETIME formated time stamps.
 fioCmpAttribs(): File Date/Time unchanged since backup
 fioCmpAttribs(): File Attributes changed since backup
 fioCmpAttribs():    Old Attribs: '----A -------- -- -- --  (0x00000020)'
 fioCmpAttribs():    New Attribs: '----A -------- XS XR --  (0x00000620)'
 fioCmpAttribs(): NTFS extended attribute data changed.
 fioCmpAttribs(): File MSDfs unchanged since backup
 fioCmpAttribs(): File Security unchanged since backup
 fioCmpAttribs(): File Share unchanged since backup
 fioCmpAttribs(): File Security CRC unchanged since backup
 fioCmpAttribs(): Named Stream Size unchanged since backup
 fioCmpAttribs(): Backup Stream Size unchanged since backup
 fioCmpAttribs(): GPFS attributes unchanged since backup
 fioCmpAttribs(): Hard Links Hash unchanged since backup
 fioCmpAttribs(): old attrib's data from build (IBM TSM 7.1.0.0)
 ***** A t r i b  T r a c e  E n d *****

Deduplication changes some file attributes. I ran an second dedup job and a incremental afterwards and nothing was backed up. So hopefully the same is true for my fileserver. :)
 
Now, the million dollar question. If you restore one of those files with the XS XR attributes, are they usable?
 
I tried a restore to an deduplicated and a non deduplicated volume. The files are usable.
 
That's good to know. So the file is rehydrated during the backup, and TSM gets the full file, not chunks.
 
I opened a request at ibm and the answer was quick an easy: TSM 7.1 does not support Microsofts Data Deduplication backup and restore API, hence optimized backup is not possible. It is known that backups on deduplicated volumes take longer. Support for that feature is not announced yet.

I still don't know why the amount of data transfered with each incremental backup has grown.

I also opened a PMR because of the same behaviour. 1TB incremental backup every day...
Lets see...
 
I guess we have answered this question in this thread. By now, after the deduplication has processed all files, my incremental backup returned to normal values. But you have to consider that backup is slower when turning on deduplication. You have to check if your backup window is large enough.
 
Normal File--> 1,073,717,248 \\srvsc050\d$\System Volume Information\Dedup\ChunkStore\{49BDEDF1-1252-4010-BC28-39234A65B521}.ddp\Data\0000001d.00000001.ccc [Sent]
Normal File--> 1,073,721,344 \\srvsc050\d$\System Volume Information\Dedup\ChunkStore\{49BDEDF1-1252-4010-BC28-39234A65B521}.ddp\Data\00000020.00000001.ccc [Sent]
Normal File--> 1,073,729,536 \\srvsc050\d$\System Volume Information\Dedup\ChunkStore\{49BDEDF1-1252-4010-BC28-39234A65B521}.ddp\Data\0000001e.00000001.ccc [Sent]

I will remove this objects from backup: exclude.dir 'd:\System Volume Information\Dedup'
 
This is very interesting, have not run into Windows 2012 dedupliation yet in our environment. I would really like to know if TSM is working on some way to provide options to only backup the hash table and chunks or rehydrate on backup. I can see reasons why either would be desirable depending on the environment.
 
Back
Top