ADSM-L

Fulls vs. incrementals for TSM DB

2006-02-16 11:57:15
Subject: Fulls vs. incrementals for TSM DB
From: "Allen S. Rout" <asr AT UFL DOT EDU>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Thu, 16 Feb 2006 11:56:49 -0500
I'm soliciting comments about my thinking re: possibly doing
incrementals of the TSM DB instead of some of the daily fulls I'm
doing now.  I'm braced for tomatoes, so fire away. :)


My big DB2 backup customers are going through the exercise of
measuring "How frequently do we really need to do fulls?".  I'm hoping
to convince them that daily fulls _plus_ some, in addition to
retaining all the logs,  can be scaled back some.

But this made me think.  I've never done that same exercise with my
own DB backups; the TSM db.

My infrastructure has more than 200G of TSM database running at the
moment, split up into 12 servers. Currently, I start my TSM DB backups
at a little after 0400, and have to struggle to get all the various
copies and migrations complete by the time the backup window opens in
the evening.  Now, I'm making an unreasonable number of copies at the
moment: one onsite copy and -THREE- offsite copies, two of which are
electronically vaulted.

While the percentage savings would be the same for any of us, my
absolute savings are starting to feel compelling.

My first response to scaling back was a shudder, but I'm trying to see
it logically.  If I trust the substrate, and do (say) an incremental
DB backup 2 out of 3 days, or some similar such...  How much exposure
am I _really_ adding?


If I go from daily fulls to every-Nth-day fulls, and run incrementals
in between:


+ I add some amount (how much?) to DB restoration in the event of a
  disaster.  Knee-jerk estimation is that the succesive incremental
  application isn't going to be huge in relation to the full.  Perhaps
  linear with size?

+ I increase by some amount my exposure to media failure.  This seems
  negligible.  If I've got a reasonable number of extra copies of my
  DB backups, the chance of media failure is acceptably tiny.

+ I add exposure to several new code paths in the TSM server
  codebase.  A bug in incremental application would mean I'd have to
  revert to the full, possibly increasing my lost-data period.  That's
  probably negligible.


.... I couldn't come up with any more negatives.  Oh,

+ My TSM administrator will have the willies about it for a while.


Accepting those risks, I win:

+ Dramatically smaller use of backup landing pad and offsite
  generation resources.

+ Dramatically smaller use of primary tape.

+ More wall-clock time to do e.g. expiration and such.



Anybody see something I'm missing?


- Allen S. Rout

<Prev in Thread] Current Thread [Next in Thread>