ADSM-L

Re: [ADSM-L] Seeking wisdom on dedupe..filepool file size client compression and reclaims

2009-08-30 14:18:52
Subject: Re: [ADSM-L] Seeking wisdom on dedupe..filepool file size client compression and reclaims
From: Bill Boyer <bjdboyer AT VERIZON DOT NET>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Sun, 30 Aug 2009 14:17:10 -0400
I do know from the beta program that 2 rules of files being skipped during the 
de-dup (identify) process are:

1. Encrypted files are skipped.
2. Small files. I think < 4k was the rule.

There are some STE webcast events on De-dup that might list all the reasons.

And then there's the DSMSERV.OPT option DEDUPREQUIRESBACKUP YES|NO. The default 
is yes. So until you run a BA STGPOOL command against the data, the identify 
process won't de-dup the files/data.

If you're backing up directly to the de-dup stgpool, you might be better off 
running IDENTIFY DUPLICATES with NUMPR=0 and letting the backups complete, then 
the BA STGPOOL complete and at that time set your NUMPR= to start the process. 
Until the data has been copypool'd you're just spinning your wheels. I don't 
know if I would trust the de-dup and the re-dup  process completely enough to 
specify DEDUPREQUIREDBACKUP NO.

Bill Boyer
"Hang in there, retirement is only thirty years away!" - ??



-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of 
John D. Schneider
Sent: Sunday, August 30, 2009 9:13 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: Seeking wisdom on dedupe..filepool file size client compression 
and reclaims

I don't remember seeing this in writing anywhere.  I this a fact, or a
supposition?  Does deduplication in TSM skip all data that has been
client compressed?  I could see how that might affect the deduplication
algorithm, but if that is true, the customer is left with an unfortunate
deal with the Devil, so to speak.

When we switched to client compression a couple years ago, it cut in
half the amount of data that our TSM servers had to absorb each night. 
That reduced memory consumption and CPU consumption.  It reduced the
amount of FC I/O that had to be done to the disk storage pools for
backups, migration, and the amount of data TSM had to process during
reclamation.  In short, by compressing at the client, we made the whole
solution more efficient, and enabled us to grow significantly (in terms
of nightly backup data) without having to add CPU, memory, network, or
FC adapters to our TSM servers.  The tradeoff, or course, is that each
individual client runs it's backup slower because it has to compress its
data as it is backing up, but there are very few clients where that has
posed a problem.  Even our large (1-2TB) Oracle databases have the
performance to use client compression in the middle of the night without
wrecking the app.  That might be true in all environments, of course, so
your mileage may vary.

But the real issue to me is, would giving up this highly efficient
configuration be worth it to get deduplication at the back end?  I guess
each customer will have to decide.  If you are trying to build a
disk-only or disk-mostly backup environment, and have plenty of CPU and
memory, and consider disk space real-estate your big bottleneck, it may
well be worth the tradeoff.

Best Regards,
 
 John D. Schneider
 The Computer Coaching Community, LLC
 Office: (314) 635-5424 / Toll Free: (866) 796-9226
 Cell: (314) 750-8721
 
 
  -------- Original Message --------
Subject: Re: [ADSM-L] Seeking wisdom on dedupe..filepool file size
client compression and reclaims
From: Grigori Solonovitch <G.Solonovitch AT BKME DOT COM>
Date: Sun, August 30, 2009 2:31 am
To: ADSM-L AT VM.MARIST DOT EDU

Hello Stefan,
I do not think you can use client compression and de-duplication
together.
De-duplication is skipping all data compressed by TSM Client or TSM
Server.
By the way, de-duplication process is trying to reduce files, compressed
by operating system (for example, *.Z, *.zip, AIX system images, etc).
Regards,

Grigori G. Solonovitch

Senior Technical Architect

Information Technology Bank of Kuwait and Middle East
http://www.bkme.com

Phone: (+965) 2231-2274 Mobile: (+965) 99798073 E-Mail:
G.Solonovitch AT bkme DOT com

Please consider the environment before printing this Email

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
Stefan Folkerts
Sent: Saturday, August 29, 2009 10:24 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: [ADSM-L] Seeking wisdom on dedupe..filepool file size client
compression and reclaims

TSM guru's of the world,

I am toying around with a new TSM server we have and I am pondering some
options and would like your thoughts about them.
I am setting up a 6.1.2.0 TSM server with a filepool only, planning on
using deduplication.

When I set up a filepool I usually make a fairly small volume size..10G
maybe ~20G depending on the expected size of the TSM environment.
I do this because if a 100G volume is full and starts expiring relaim
won't occur for a while and that makes up until 49% (49GB) of the volume
space useless and wasted.
So I set up 10G volumes in our shop (very small server) and just accept
the fact that I have a lot of volumes, no problem TSM can handle a lot
of volumes.

Now I am thinking, dedupe only occurs when you move data the volumes or
reclaim them but 10G volumes might not get reclaimed for a LONG time
since they contain so little data the chance of that getting reclaimed
and thus deduplicated is relatively smaller than that happening on a
100G volume.

As an example, I migrated all the data from our old 5.5 TSM server to
the new one using a export node command, once it was done I scripted a
move data for all the volumes and I went from 0% to 20% dedupe save in 8
hours.
If I would let TSM handle this it would have taken me a LONG time to get
there.

If I do a full Exchange backup I fill 10 volumes with data, identify
will mark data on them for deduplication but it won't have an effect at
all since the data will expire before the volumes are reclaimed.
This full Exchange backup will happen every week and is held for 1
month, that means the bulk of my data has no use of deduplication with
this setup or am I missing something here? :)

So I am thinking, with a 10G volume being filled above the reclaim
threshold so easy and therefor missing the dedupe action what should one
do?
I would almost consider a query nodedata script that would identify
Exchange node data and move that around for some dedupe action.

Also client compression, does anybody have an figures on how this effect
the effectiveness of deduplication?
Because these are both of interest in a filepool, if deduplication works
just as good in combination with compression that would be great.

Regards,

Stefan

Please consider the environment before printing this Email.

"This email message and any attachments transmitted with it may contain
confidential and proprietary information, intended only for the named
recipient(s). If you have received this message in error, or if you are
not the named recipient(s), please delete this email after notifying the
sender immediately. BKME cannot guarantee the integrity of this
communication and accepts no liability for any damage caused by this
email or its attachments due to viruses, any other defects, interception
or unauthorized modification. The information, views, opinions and
comments of this message are those of the individual and not necessarily
endorsed by BKME."