ADSM-L

Re: [ADSM-L] Seeking wisdom on dedupe..filepool file size client compression and reclaims

2009-08-30 12:39:40
Subject: Re: [ADSM-L] Seeking wisdom on dedupe..filepool file size client compression and reclaims
From: "John D. Schneider" <john.schneider AT COMPUTERCOACHINGCOMMUNITY DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Sun, 30 Aug 2009 09:38:08 -0700
Greetings,
     I already responded to a similar post today that was also about
client compression, and I don't want to waste listserv bandwidth beating
the same drum, but... I am going to.  If you are sick of this subject,
don't read further.
     What we mean by "best" is highly subjective.  Hardware compression
is "better" than software compression if by "better" you mean, which one
takes up the least space on tape.  But which one has the most impact on
the whole performance of the environment?  In TSM data doesn't go only
to tape, then stay there forever.  It gets moved around over and over.
     Take the example of a TSM server that absorbs 4TB of new client
data per day.  No client compression.  So in each 24 hour period, the
TSM server must:

1) Absorb and process 4TB into memory buffers from its network
interface, and write them to 4TB of disk storage pool.
2) Backup stgpool 4TB of disk storage pool to copy storage pool tape,
storing it in memory buffers along the way.  When it is stored, it only
takes up 2TB (2:1 compression typical).
3) Migrate 4TB of disk storage pool to tape, storing it in memory
buffers along the way.  When it is stored, it only takes up 2TB of tape
(2:1 compression typical).
4) Reclaim storage pool tapes.  Read some number TB of compressed tape,
where the tape drives uncompress it to twice as many TB before returning
it to the TSM server, which has to handle that many TB worth of memory
buffers, then write them back out to tape, where the tape drive
compresses it out to half as much tape.

All along the way the TSM server had to handle 4TB of data in network
buffers and memory buffers, and pass it through FC and network adapters
over and over again.  

But what about that same situation with client compression?

1) Absorb and process 2TB of client data into memory buffers, because
the 4TB of client data was compressed down to 2TB before it got to the
TSM server.
2) Only 2TB of memory and I/O is required to write it to disk storage
pool, and only 2TB is disk storage pool is required.
3) For the rest of that data's life expectancy, no matter how many times
it is migrated or reclaimed, the compressed form is used, so there is
half the overhead involved in processing it.

*** RANT MODE ON ***  :-)
Those people who say client compression is only useful for slow networks
has never tried it both ways in a large TSM environment.  Two years ago
we had a 12 hour backup window, and backup stgpool and migration were
taking 10 hours, leaving only about 2 hours for reclamation.  We were
sucking through tapes like you wouldn't believe, because there was no
time to reclaim them.  

We switched to client compression, which took us a few weeks to push
everywhere, but once it was on all the clients, our backup
stgpool/migration cycle dropped from 10 hours to 5-6 hours, giving us 6
hours for reclamation.  In no time we were keeping up.  And our 12 hour
backup window was no problem either.  Some clients that backed up in 1
hour went to 2 hours for example, but what difference did that make?  We
moved some clients to a different schedule so they would start earlier
in the evening, to make sure they were done by 6am, but that was easy to
do.  There were a few clients that were too slow with compression turned
on, but that was maybe 6 out of 900 clients?  Something like that.  

I think client compression was a very good move for us.  We have been
growing at the rate of 60% per year for the last three years.  We
started out at 960 clients, about 4.5TB/day.  Today we back up ~1700
clients, and about 15TB per day.  If we wanted to turn client
compression off at this stage of the game, we would really have to beef
up the memory and I/O performance of our TSM servers, or they would be
totally buried.

*** RANT MODE OFF***  

 Best Regards,
 
 John D. Schneider
 The Computer Coaching Community, LLC
 Office: (314) 635-5424 / Toll Free: (866) 796-9226
 Cell: (314) 750-8721
 
 
  -------- Original Message --------
Subject: Re: [ADSM-L] Seeking wisdom on dedupe..filepool file size
client compression and reclaims
From: Roger Deschner <rogerd AT UIC DOT EDU>
Date: Sun, August 30, 2009 9:34 am
To: ADSM-L AT VM.MARIST DOT EDU

On Sat, 29 Aug 2009, Stefan Folkerts wrote:
>TSM guru's of the world,
>
>Also client compression, does anybody have an figures on how this effect
>the effectiveness of deduplication?
>Because these are both of interest in a filepool, if deduplication works
>just as good in combination with compression that would be great.

Client compression should greatly reduce or eliminate the possibility of
deduplication, whether by TSM or by hardware device such as Data Domain.
(BTW Client encryption effectively prevents deduplication.)

So you need to decide which strategy will save the most space - client
compression _or_ deduplication. There are tradeoffs here. Previous
studies in this area have compared client compression versus tape drive
hardware compression, and in those studies tape drive compression was
always the winner. But this is a new world, that I'm about to join too.
I'm anxious to see a comparison. Client compression has some severe
downsides, such as a noticable lengthening of client backup times, and a
greater performance hit on client systems during backup. IBM only
recommends it in cases of limited network bandwidth.

The one option here that hasn't been explored is disk compression on the
TSM server in the file storagepool, as provided by the AIX OS for
instance. This would work with deduplication. If all it takes is a few
more CPU cycles in the TSM server, it might be worthwhile. Has anyone
studied this?

Roger Deschner University of Illinois at Chicago rogerd AT uic DOT edu
Academic Computing & Communications Center
==== "Research is what I'm doing when I don't know what I'm doing." ====
========================= -- Wernher von Braun =========================