TSM / Datadomain and waste space

clepron

Active Newcomer
Joined
Feb 24, 2009
Messages
15
Reaction score
0
Points
0
Location
France / Nantes
Hi all,

We are using TSM with datadomain, with CIFS share (not VTL).

TSM is creating 10G and 20G files to simulate tapes (30 000 files in our system).
The reclaim ratio is 90%

We have discovered that there is a lot a waste space in tapes in which data have expired. Theses data are not accessible by TSM, but still exist in files, and so are still considered by datadomain as block to keep.

Is there a way to erase these data during expire process for example (overwrite with 0 char for example) ?
We could increase the reclaim ratio, but does not solve totally the problem.

Thanks
Christophe
 
I am also using Data Domain to simulate tape but not as CIFS but NFS. I am not having your issues and after TSM runs reclamation, the created 'tapes' are deleted.

How did you define the devclass for the 'tape' simulation? devclass=file?
 
Yes I agree, tapes are deleted with reclamation, but before they reach the reclam ratio, data are expired by TSM, but not really deleted on tapes.
So if your reclam ratio is 90, data written on tapes are not erased until the tape is 10% full (tape, so the file, will be deleted at this level).
So 90% of the tape still contains the old data , even if TSM is unable to access it because they are virtually deleted (doesn't exist anymore in TSM database).

Datadomain occupancy is then much greater than TSM occupancy.
 
Hi,

what about changing the reclamation threshold to lower number like 60 (so the average utilization is 70%) - now with 90 you avg. utilization is just 55% ....

Harry
 
Yes, we are working to reduce the reclaim level, but it will make TSM and datadomain work harder during the reclaim process. So we have to be careful about any overload.

But it will not erase all wasted space. It just a step, not the solution.
I know there is a shred option in the reclam process, but I don't know if it could be used for this architecture.
http://publib.boulder.ibm.com/infoc...4/Administration/TSM54_shred_data/player.html
 
I have my reclaim set at 90% and almost have my DD and TSM utilzation at par.

How big did you set the creation of the devclass=file to be?
 
20GB for 'tapes' containing databases backups (oracle/sql...)
10GB for 'tapes' containing files backups.
 
I am also using Data Domain for TSM, using nfs. I have a couple comments on this issue. First, data that TSM has expired on a volume may still be valid data on the Data Domain array because a deduplication pointer may still reference it. So the reclaim may not give back as much space as you would expect. Second, when you delete data on the array I have been told that you will not recover that space until after the cleaning process runs.

I did some testing and after running reclamation I was actually using more space on the Data Domain side than I was prior to the reclaim. It was not until I ran filesystem clean that I got back any space.
 
Yes, you are right. Data domain has to run cleaning to recover space. you can check how much cleanable space is there on Data domain from Enterprise Manger GUI or command line interface of Data domain.
on command line run command, 'filesys show space" and check column name "Cleanable GiB*" for numbers.
 
datadomain will remove data with cleaning process, right, but only blocks which are not used any more, which are not on disk anymore.
The data expired by TSM are still on disk so theses data are not cleaned/purged by datadomain. This is what I call 'wasted space'.

There is two ways to purge them:
- write 0 (for example) on all expire data, on all 'tapes' (or files in this case). What is called shredding on IBM documentation.
- reclaim tapes to optimize the data organization and remove as much tapes/files as possible (removing tapes will really remove expired data which are in it, so these data will now be purged in the datadomain cleaning process)

The second way is not optimal and use more TSM and datadomain CPU/IO. The ratio has to be defined carefully.

The first way is... an hypothesis. Perhaps using the shred option, but I'm not sure it's possible.
It's the optimal way, if it is possible, but it could use CPU too.

I'm very surprise that this problem is not described anywhere else.

PS: I'm not a TSM administrator, just a project manager trying to fix a occupancy issue...
 
I don't think what you are seeing is a problem to be solved, but functioning as designed. In the past when tape was most common, we would size the tape capacity to be 2x the estimated size of the data to be backed up. This was assuming that all tapes would be on average 50% full. The file devices in TSM are not unlike tapes. Data is logically expired, and the expired space cannot be reused until the active data is moved to a new volume and the old one scratched. You can't zero out the expired portions of these volumes, only the entire volume. Even if you could, that could be problematic if you were taking snapshots, every zero written would be a "change" and the original data would consume space in the snapshot.

That makes me think, are there snapshots on this system that are not being expired? That could certainly cause problems over time.
 
yes it's not a bug, it works like that.
But it's a pity to see a 50% full file (tape) (5GB full of TSM data for 10GB tape) having a dedup occupancy greater than 5GB , isn't it ?

And datadomain saying: "I'm great, I've reduce your file from 10GB to 6GB with dedup ! "
- Wonderful, but there were only 5GB usable data in it ...

;-)

It's a joke, but it's a fact on several files/tapes...
 
I use a NFS for my DD and don't experience many problems. Remember that the "wasted space" is deduped. The waste is theoretical and not really in use. What I would suggest is to allow the DB's to backup their data directly to the CIF or NFS share rather than using the TDP. It saves you licensing and also puts the ownership of the backup process back where it belongs. With the DBA's. The new Firmware (due soon) is suppose to allow quotas so you could give the DBA's a specific sized FS and make them keep their directory from filling. If you're worried about mixing the data from TSM with the DB backups make sure you create a seperate MTree for the DBA's to act as their mount point.
 
To solve the issue with wasted space due to DD still maintaing data that TSM have released you just have to let
tsm create the file device volumes by ist self. Do not assign a volume to the stg pool. TSM will create a scratch volume. Once the volume is empty (du to reclamation) it will be scratched and the file will be deleted. Once the file is deleted the DD will reclaim the storage. Make sure that you do not create very large volumes as this can cause the backup sessions to time out while they are waiting for storage space.
the maximun number of volumes used by the stgpool is limited by the maxscratch parameter.
the size of the volumes is determined by the estcap value for the device class
 
We do as Hogmaster said and TSM deletes the volumeonce reclaimed. I also use 100GB volume sizes to make it quicker and easier to reclaim volumes.
 
Hi
thanks all.
We have increased reclaim level, and are waiting for result.

Does anyone tried the shred option ? It's specified it can be use on 'random access disk'. Does it apply to volumes on CIFS datadomain shares ?

thanks
Christophe
 
Christophe,
To get Data Cleaned out from Data Domain, the cleaning w/in DataDomain needs to complete. By Default, it is set to Every Tue @ 6am. When you run reclaimation in TSM and subsequently expiration, the data gets marked off and when Cleaning runs in DataDomain, it will clean it out. It is advised to make DataDomain CIFS/NFS volume size in the devclass to 100G in size, so you can easily reclaim. Think if you have created a 1TB volume, you will wait longer for it reclaim as it has to meet the set thresholds.
If you have set larger volumes, then change it to set smaller volumes, and then move data off of those large volumes to get space back.
Hope this helps.
Also, I hope you have used "FILE" type device class for DataDomain so it acts like a Squential Access and NOT "DISK" ( which will be random access) Otherwise you will see BIG issues performace wise and all.
 
I hate to post to such an old thread but this is the exact issue I'm seeing and in my case it is not a matter of DD needing to run cleaning. For one pool TSM knows of ~15000 volumes. When I browse the share on the DD there are ~30000. File size is 2GB so these orphaned files occupy 30TB or ~15TB deduped. A huge amount of space. I have cleaning run regularly and at this moment have 0.1GB cleanable. Most of the volumes/files TSM knows about are in the 80%-100% utilized range so reclamation isn't an issue.

If the same issue were to occur in a tape library I would simply have TSM inventory the library then it would 'know' of all of the volumes.

Ugh.
 
Hy i make many test and the best pratice is to use higth threshold (80/90%) in my case
tsm removes volume full whith file unusable and it's the DataDomain which reclame.
 
Back
Top