ADSM-L

Re: [ADSM-L] How to Incorporate a CDL into TSM environment?

2007-06-08 02:33:52
Subject: Re: [ADSM-L] How to Incorporate a CDL into TSM environment?
From: Curtis Preston <cpreston AT GLASSHOUSE DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Fri, 8 Jun 2007 02:32:48 -0400
John,

After all the talks I've given promoting the use of VTLs with TSM and
other products, it's good to finally hear from someone who has been able
to actually DO it in multiple environments.  I concur with almost all of
your comments.  I do have questions about some of them, and then I have
some comments about the overall VTL industry.  It sounds like you've got
much more real-world experience with the TSM/VTL combination than I
have, so please take my comments as curiosity and/or request for
confirmation, not confrontation.

The first question that I have is what size environments have you been
able to implement these recommendations for?  Many of them strike me as
perfect for small-to-medium shops, but not easy to implement in large
shops.  (Our customers are some of the largest TSM shops in the world.)

>The second is a Redbook from IBM about their VTL solution.  There is a
>chapter in it specific to how a VTL can help in a TSM environment.

You do realize that both products are Falconstor underneath, right?  So
at this point, the only material difference between the two is the
hardware that IBM/EMC puts around it.  Sun uses it as well.

>1) Design the solution and size the CDL so that most or all Primary
>storage pools can fit on the CDL. 

I couldn't agree more.  The chaleng that I've found is that most of our
large customers have been unable to justify the cost of VTLs that are
the same size as their tape libraries.  The advent of de-dupe is
changing all that, as a 200 TB tape library can be replaced by 10 TB of
disk.

Let me speak to this for a second.  De-dupe ratios definitely are an
area where your mileage will vary.  TSM filesystem progressive
incrementals will not get the same level of de-dupe as other shops that
do frequent full backups, as that is where a lot of the duplicated data
comes from.  However, duplicated data also comes from the same files
being placed in multiple places (emails, filesystems, multiple users
using the same doc but putting it in multiple places, etc.).  It also
comes from repeated incrementals of the same file that changes just a
little bit each day, such as a spreadsheet that someone updates every
day.  So TSM environments will still see plenty de-dupe on their
filesystem backups, just not 20:1.  They will also see the same de-dupe
ratios as everybody else when they backup Oracle, DB2, Exchange, SQL
Server, etc, as TSM does the typical full/incremental backups there.

The bummer thing about de-dupe is that it's not available in most of the
major OEM VTLs.  I believe Sun is selling Falconstor's de-dupe, and HDS
is definitely selling Diligent's Protectier.  IBM hasn't let me know
what they're doing yet, and EMC is still saying they're going to write
their own.  HP's VTL (provided by SEPATON) doesn't yet offer their
de-dupe feature.  NetApp's VTL doesn't yet offer de-dupe.

That means that those same large shops that I'm saying should use
de-dupe won't use it because it's not available from their OEM.  (As I
said, you're fine if you use HDS or Sun, but not if you want to buy it
from EMC, HP, or IBM -- yet.)  Bummer.

I'm pretty bullish on de-dupe and I think it's ready for prime time as
long as everything is also on tape.  (Your copies you're creating for
offsite DR will do.)  It solves a lot of problems.  It reduces
acquisition cost (by a factor as much as 20:1) and reduces power and
cooling cost by the same factor.  And as long as everything is also on
tape for DR, you've got a risk mitigation copy in case you picked the
wrong de-dupe product and it completely goes toes-up on you.

So, I'd say that your idea is totally implementable in large shops if
they use de-dupe.  Otherwise, we're talking way too much disk when you
consider that most people have 10 GB on tape for every 1 GB they have on
disk.

>Direct the client backups directly to the virtual tapes, instead of
>going to disk storage pool.  

Again, I think this will work fine in many environments.  Our large TSM
customers have hundreds (or thousands) of clients backing up
simultaneously to their disk pools.  You can't define 500 or 1000
virtual tape drives, and you wouldn't want to if you could.  So these
customers would have an issue implementing your suggestion.

>This will save you hours of time in
>the schedule not having to migrate from disk to tape.  

Again, if you can do it, I agree.  Many people can't, unfortunately.

>There is no particular advantage to collocating storage pools in a
>virtual tape environment.  

I'm not sure I'm sold on this.  I'm not saying I disagree; I'm just not
sold.  My fellow consultants and I have discussed this ad nauseum.  My
experience is that mounting a virtual tape still takes a finite amount
of time and when you multiple that finite amount times the number of
tape mounts you may experience in a completely uncollocated world, it
may add up to a significant amount of time.  (Again, this may be a much
bigger deal in larger environments.)

I would have to do a test on a couple of hundred tape mounts in TSM to
see how much time that would really take before I can come to a decision
here.  Have you done that?  I can tell you I did it in NetBackup and
NetWorker, and I was amazed at how long it took to mount a virtual tape.

>By turning off collocation, you can get better overal utilization 
>of the disk space in the CDL.

I'm really curious on this one.  It's not for the same reason as tape,
right? ...where you waste 380 GB of a 400 GB tape if you've got a 20 GB
client?  Most VTLs that I'm aware of only use up as much space as you
use.  IOW, if you have a 400 GB virtual tape but only send 20 GB to it,
you only use 20 GB of the VTL.

Now the area where I can potentially see savings is that you don't
typically end up doing much reclamation on a collocated tape.  You don't
reclaim it because if it's not full, there's no point.  But if you
combine my first thought on this with this thought, you would end up
using more disk with collocated, non-reclaimed tapes.  Perhaps, then you
should consider doing reclamation against those collocated tapes.

If I'm right and hundreds of tape mounts takes hundreds of minutes, then
maybe restore performance should take precedence over tape utilization
-- just like in the real tape world.

>If your primary storage pool is on a CDL, set the
>reclamation threshold at 50% (or whatever you prefer) and leave it
>there.

I see two potential areas for concern in large environments.  The first
is that if you're doing reclamation during backups, reclamation is
reading your VTL disks while you're writing to them if you're allowing
it to run while backups are going on.  That creates disk contention that
may hurt the performance of your backups.

The second potential area for concern is contention for the TSM
database.  Backups create quite a bit of activity in the database, and
adding reclamation activity while backups are going will create
additional updates and queries, possibly causing you to hit a point
where the database can't keep up with all that activity, causing your
backups to slow down.

These two reasons are why we generally advise TSM environments to
disable expiration and reclamation while backups are going on.  We
advise them to follow the typical serial TSM schedule of backup to disk
pool, create DR copy of backup by copying disk pool to tape, migrate
disk pool to tape, TSM database backup, expiration, and reclamation.  (I
like to backup the TSM database before and after expiration/reclamation
if I can get away with it.)