Controlling restore time is key and has to do with the
number of media needed. But limiting restore time does not mean we have to
have excess backup copies. With a disk-only solution we
obviously don't care about number of media. With tape,
forever-incrementals suffer from excess restore times if you do not
consolidate media. Reprocessing incrementals into synthetic fulls is one
approach but it is inefficient because it moves entire backup images not just
the files that are still active. So it seems using NBU with tape
leads us to create excess copies of our files as part of a strategy to control
restore times. I will save the TSM argument for another day, but
the funny thing is we seem to be happy to bolt on a deduplicating device to
squeeze out the copies that were introduced by NBU. Of couse these devices
are disk-based so as not to reintroduce long restore times, but the marketing
focuses more on building negativities towards tape ("tape sucks, move on",
etc.). The entire argument of course ignores the issue of measuring the
real cost associated with excess backup copies, which led me to post my
original question: how can we measure the amount of data that is in the backup
system beyond what is needed for data protection?
I suspect the answer to my question is something
like "50% of the backup volume is not technically needed to protect the data,
but it is there only to provide a upper bound on RTO." If this is a the
case then a strong case can be made to address this not with a deduplication
device, but with a more sophisticated media consolidation
strategy.
“unnecessary copies”
made me remember to say its important to plan on restore time as well as backup
time.
While it may be
“redundant” to have all your files on every backup it sure saves a lot of time
when you only have to restore the LAST backup.
That is to say that
theoretically you could eliminate all sorts of redundancy by simply doing a full
backup when you first install then do CumulativeIncrementals from that day
forward. 2 years later when you finally crashed you could restore
the original full and all the intervening CumulativeIncrementals to get back to
where you were but you’d likely have gotten there just as fast by reinstalling
and rekeying all the missing data.
Most people strike the
balance by doing Fulls once a week and either Incrementals or
CumulativeIncrementals the rests of the week. Sometimes the size and type
of data being backed up makes a difference as well. For example we
do daily backups (using BCV copies) of our 2+ TB Production
Database.
From:
veritas-bu-bounces AT mailman.eng.auburn DOT edu
[mailto:veritas-bu-bounces AT mailman.eng.auburn DOT edu] On Behalf Of Michaels, Keith R Sent: Tuesday, April 08, 2008 11:06
PM To:
veritas-bu AT mailman.eng.auburn DOT edu Subject: Re: [Veritas-bu] Measuring
redundant backup data
It should be possible
to go through the catalog and determine how much redundancy is present based on
the schedules and retentions. For example if the schedule calls for monthly
fulls and the same file appears in 12 consecutive full backups (without
appearing in any intervening incremental) then that's 10 unnecessary copies,
assuming 2 are needed for adequate protection. I know there's additional
duplication if the same file exists on two clients but that's harder to measure
without comparing the data. I'm just interested in measuring the
unnecessary copies that were created as a result of multiple backups of the same
data.
From: Ed Wilts
[mailto:ewilts AT ewilts DOT org] Sent: Tuesday, April 08, 2008 7:05
PM To: Jeff
Lightner Cc: Michaels, Keith R;
veritas-bu AT mailman.eng.auburn DOT edu Subject: Re: [Veritas-bu] Measuring
redundant backup data
On Tue, Apr 8, 2008 at 5:57 PM, Jeff Lightner <jlightner AT water DOT com>
wrote:
I don't know a way to measure how much is "redundant"
easily. Maybe the much vaunted Aptare would have that - I'll wait for
their fan club to comment on that. :-)
Not a chance - Aptare just gets job status and doesn't
ever see the backup data.
So far it appears to us the deduplication devices are
living up to or exceeding expectations.
That's purely site specific. With PureDisk backing
up our remote sites, I think we're under 5:1 but we're still building up the
generation count. When we pointed some of larger main campus data at it,
it wasn't even that high - nowhere near high enough to justify the
cost.
Some vendors will let you eval you a unit - that's
the only way to know how well you're going to dedupe because it is so client
specific. If you have a ton of application servers with mostly OS and
little application, you're going to de-dupe extremely well. If you have 1
file server full of TIFF data that never stays around very long, you won't
de-dupe well at all.
-- Ed Wilts, Mounds View, MN,
USA mailto:ewilts AT ewilts DOT org
---------------------------------- CONFIDENTIALITY NOTICE: This e-mail
may contain privileged or confidential information and is for the sole use of
the intended recipient(s). If you are not the intended recipient, any
disclosure, copying, distribution, or use of the contents of this information is
prohibited and may be unlawful. If you have received this electronic
transmission in error, please reply immediately to the sender that you have
received the message in error, and delete it. Thank
you. ----------------------------------
|