ADSM-L

Re: [ADSM-L] Seeking wisdom on dedupe..filepool file size client compression and reclaims

2009-08-30 08:47:37
Subject: Re: [ADSM-L] Seeking wisdom on dedupe..filepool file size client compression and reclaims
From: "John D. Schneider" <john.schneider AT COMPUTERCOACHINGCOMMUNITY DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Sun, 30 Aug 2009 05:46:04 -0700
Greetings,
   The size of the volumes can be very relevant.  Any of us who has
managed a large virtual tape environment has run into this same issue. 
If you have 10TB in your pool, it would be a tempation to define 100
volumes of 100GB.  That might not create any problem if you have a
relatively small number of clients backing up at once, and they don't
have enough data to overwhelm your pool, but it is not hard at all to
accidentally design your solution so within a couple weeks all 100 of
your volumes are "full" and you can't back up any new data.  But in
reality they are all only 50% full, and can't be reused.  1000 volumes
of 10GB each would have worked better, because in reality the data
doesn't expire randomly across the volume; some of those smaller volumes
would have been reclaimed before the others.  And if you have client
files that are larger than your volume size, some of your volumes' data
will expire all at once and never need to be reclaimed at all.

I have tried building virtual tape environments using various sizes, and
smaller is better, to a point.   We use 50GB volumes in our environment
because we have 60-70TB virtual tape libraries with 1600 clients.  It
would probably not hurt to use even smaller volumes.  Volumes smaller
than 10GB starts to hit a point of diminishing returns.

In a virtual tape environment you have to think about how many
simultaneous tape mounts you have going on, and when you have 100+
simultaneous tape mounts you can have problems and you may have to bump
up LIBSHRTIMEOUT in dsmserv.opt if you are using libarary sharing.  But
in a file-type storage pool you don't have that concern.

Best Regards,
 
 John D. Schneider
 The Computer Coaching Community, LLC
 Office: (314) 635-5424 / Toll Free: (866) 796-9226
 Cell: (314) 750-8721
 
 
  -------- Original Message --------
Subject: Re: [ADSM-L] Seeking wisdom on dedupe..filepool file size
client compression and reclaims
From: "Allen S. Rout" <asr AT UFL DOT EDU>
Date: Sat, August 29, 2009 6:57 pm
To: ADSM-L AT VM.MARIST DOT EDU

>> On Sat, 29 Aug 2009 09:24:11 +0200, Stefan Folkerts <stefan.folkerts AT ITAA 
>> DOT NL> said:


> Now I am thinking, dedupe only occurs when you move data the volumes
> or reclaim them but 10G volumes might not get reclaimed for a LONG
> time since they contain so little data the chance of that getting
> reclaimed and thus deduplicated is relatively smaller than that
> happening on a 100G volume.

I think that, to a first approximation, the size of the volume is
irrelevant to the issues you're discussing here.

Do a gedankenexperiment: Split 100TB into 100G vols, and into 10 10G
vols. Then randomly expire data from them.

What you'll have is a bunch of volumes ranging from (say) 0% to 49%
reclaimable. You will reclaim your _first_ volume a skewtch sooner in
the 10G case. But on the average, you'll reclaim 500G of space in
about the same number of days. Or said differently: in a week you'll
reclaim about the same amount of space in each case.

I need to publish a simulator.


So pick volume sizes that avoid being silly in any direction.

- Allen S. Rout