ADSM-L

Re: [ADSM-L] Good or bad idea?...offsite disk.

2007-04-02 14:35:23
Subject: Re: [ADSM-L] Good or bad idea?...offsite disk.
From: Kelly Lipp <lipp AT STORSERVER DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Mon, 2 Apr 2007 12:34:55 -0600
OK, I did volunteer.  I'll take a crack at these issues... 


1) reusedelay.  In circumstances where I'm thinking of these FILE vols
   as replacing DISK vols, I don't want to have to hold on to that
   space for a few days before it becomes available again.  Just
   reusedelay=0, and view the data as ephemeral?

As is the case with the other sequential device classes one must be
concerned about when a volume is reclaimed and then reused in
conjunction with a database backup.  I believe that making sure that all
of your daily processing completes, especially the backup stgpool
operations and db backup is the key to success.  When we have had
problems using large pools of files it has been when we've needed to
restore the database and audit those volumes.  Consideration  and
process is the key.  Make sure that if you do restore a database that
you audit fix=yes all of the volumes.  This can be time consuming...

2) Performance management.  Currently I'm on Big ol' raids of 36G SSA,
   which can accept 30MB/s per raid.  I allocate a stripe across all
   my RAIDs to each DISK stgpool, which means I've got e.g. 9 or 11
   different performance contention domains.  This works really well:
   if all my servers are booming, I still have really nicely spread
   IO.

   If I go against FILE devclasses, especially pre-allocated, it seems
   like I lose that, and I could easily have all my servers trying to
   write to the same one or two LUNs.  How would you handle that?
   Just throw it all at the disk devices' cache and hope it's enough?

Clearly this is of great concern.  However, I believe that over time
this problem remedies itself.  Since sequential access volumes can be
reclaimed, one will find that when some process or client needs a volume
it will get one more or less randomly throughout all the storage thus
spreading the load.  You are correct, though, in that at first, TSM will
allocate volumes in numerical order and will thus keep the first disk
hot and then the second and so on until the volumes start to be
reclaimed.  Then the load will normalize across the drives.  In the long
run I don't think this is an issue.  Volumes will be available across
all drives and will be somewhat randomly used by TSM.

I think the key is to use pre-defined volumes rather than scratch
volumes.  This keeps each of these volumes contiguous on the disk and
increases overall performance.

Is there an optimal configuration for your storage for this use?  I'm
sure.  Is it the one you have now?  Maybe not, but it is probably worth
a try.  You can more or less easily switch it around later if you are
not convinced.  Typically RAID5 is death to smoochy in TSM.  But with
high end controllers with lots of write cache this is mitigated.  One
could front the file device class pool with a disk device class pool.
We've done that in some cases.  Then limit the number of write streams
using migration.  This takes the problem of many writers (like 100
simultaneous client backups) down to a couple.  Your SATA drives will
love you!

3) Free space consolidation.  The biggest reason I want to use FILEs
   is that I'd like to be able to deploy my 3TB of landing pad in such
   a way that any server, any FILE stpool, can temporarily overflow
   without any OS-level reallocation nonsense; just changing
   maxscratch or some such.  I thought I might be able to do this with
   a library handing out the FILE vols; am I insane?

Back to my earlier point: I don't like scratch volumes.  But one could
try it.  However, why have more than one pool anyway?  But if you do,
the scratch and trigger mechanisms available for file device class might
work well, but I have not experimented with it much.  When I did play
with it, I struggled with understanding the principles involved and
opted to pre-define.

A couple of other pointes about the file device class: caching isn't
supported, it is sequential so TSM does not need to allocate space as it
does in a disk device class pool so is faster, it is strategic to TSM
futures.  That means new stuff will be put there first and disk might be
dead somewhere along the line.  Best feature: these volumes can be
reclaimed where disk device class cannot.

The beauty of the software we are using is that we can change this stuff
around fairly easily to see what we get.  If we don't like it, we can
change it back.  I'll be very curious to hear how your experimentation
goes.  We've had good luck with very large file device pools and have
built a number of disk only STORServers (DR is done to tape, but all of
the onlinepools are disk).  Do let me know!

Thanks,



Kelly J. Lipp
VP Manufacturing & CTO
STORServer, Inc.
485-B Elkton Drive
Colorado Springs, CO 80907
719-266-8777
lipp AT storserver DOT com