ADSM-L

Re: [ADSM-L] Seeking wisdom on dedupe..filepool file size client compression and reclaims

2009-08-30 02:36:31
Subject: Re: [ADSM-L] Seeking wisdom on dedupe..filepool file size client compression and reclaims
From: Stefan Folkerts <stefan.folkerts AT ITAA DOT NL>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Sun, 30 Aug 2009 08:34:47 +0200
Interesting ideas and a simulator would be fun for this purpose.
You could be right and your example does make sense in a way but still..
I do wonder if it works out in the real world.

Let's say you have normal data that expires (user files etc) and large
databases, some you keep for many months and sometimes even years.

If you use 200G volumes and a database fills this volumes for 60%+ this
volume might not expire for a long time even if the rest of the data has
expired, that leaves a waste of 80G

If you use 18G volumes and a database fills 6 volumes for a 100% and one
volume for 60%, that leaves a waste of 10,8GB.

Also, I don't have a clue of what the downside of small volumes could
be, is there a disadvantage in having a few hundred volumes instead of
30 large ones?
I can't think of a problem, maybe fs performance if the amount of
volumes becomes insane..or a slight TSM performance impact if you start
the reclaim process or do a query nodedata or something like that.

This remark : Do a gedankenexperiment: Split 100TB into 100G vols, and
into 10 10G
vols.  Then randomly expire data from them.

Is not how I think real world data works, data doesn't expire randomly,
parts of it do but large chunks of it don't (databases)

Please prove me wrong, I love to learn new stuff! :)


-----Oorspronkelijk bericht-----
Van: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] Namens Allen
S. Rout
Verzonden: zondag 30 augustus 2009 1:57
Aan: ADSM-L AT VM.MARIST DOT EDU
Onderwerp: Re: [ADSM-L] Seeking wisdom on dedupe..filepool file size
client compression and reclaims

>> On Sat, 29 Aug 2009 09:24:11 +0200, Stefan Folkerts
<stefan.folkerts AT ITAA DOT NL> said:


> Now I am thinking, dedupe only occurs when you move data the volumes
> or reclaim them but 10G volumes might not get reclaimed for a LONG
> time since they contain so little data the chance of that getting
> reclaimed and thus deduplicated is relatively smaller than that
> happening on a 100G volume.

I think that, to a first approximation, the size of the volume is
irrelevant to the issues you're discussing here.

Do a gedankenexperiment: Split 100TB into 100G vols, and into 10 10G
vols.  Then randomly expire data from them.

What you'll have is a bunch of volumes ranging from (say) 0% to 49%
reclaimable.  You will reclaim your _first_ volume a skewtch sooner in
the 10G case. But on the average, you'll reclaim 500G of space in
about the same number of days.  Or said differently: in a week you'll
reclaim about the same amount of space in each case.

I need to publish a simulator.


So pick volume sizes that avoid being silly in any direction.

- Allen S. Rout

<Prev in Thread] Current Thread [Next in Thread>