Re: [Veritas-bu] Measuring redundant backup data

Controlling restore time is key and has to do with the number of media needed. But limiting restore time does not mean we have to have excess backup copies. With a disk-only solution we obviously don't care about number of media. With tape, forever-incrementals suffer from excess restore times if you do not consolidate media. Reprocessing incrementals into synthetic fulls is one approach but it is inefficient because it moves entire backup images not just the files that are still active. So it seems using NBU with tape leads us to create excess copies of our files as part of a strategy to control restore times. I will save the TSM argument for another day, but the funny thing is we seem to be happy to bolt on a deduplicating device to squeeze out the copies that were introduced by NBU. Of couse these devices are disk-based so as not to reintroduce long restore times, but the marketing focuses more on building negativities towards tape ("tape sucks, move on", etc.). The entire argument of course ignores the issue of measuring the real cost associated with excess backup copies, which led me to post my original question: how can we measure the amount of data that is in the backup system beyond what is needed for data protection?

I suspect the answer to my question is something like "50% of the backup volume is not technically needed to protect the data, but it is there only to provide a upper bound on RTO." If this is a the case then a strong case can be made to address this not with a deduplication device, but with a more sophisticated media consolidation strategy.

From: Jeff Lightner [mailto:jlightner AT water DOT com]
Sent: Wednesday, April 09, 2008 6:03 AM
To: Michaels, Keith R; veritas-bu AT mailman.eng.auburn DOT edu
Subject: RE: [Veritas-bu] Measuring redundant backup data

“unnecessary copies” made me remember to say its important to plan on restore time as well as backup time.

While it may be “redundant” to have all your files on every backup it sure saves a lot of time when you only have to restore the LAST backup.

That is to say that theoretically you could eliminate all sorts of redundancy by simply doing a full backup when you first install then do CumulativeIncrementals from that day forward. 2 years later when you finally crashed you could restore the original full and all the intervening CumulativeIncrementals to get back to where you were but you’d likely have gotten there just as fast by reinstalling and rekeying all the missing data.

Most people strike the balance by doing Fulls once a week and either Incrementals or CumulativeIncrementals the rests of the week. Sometimes the size and type of data being backed up makes a difference as well. For example we do daily backups (using BCV copies) of our 2+ TB Production Database.

From: veritas-bu-bounces AT mailman.eng.auburn DOT edu [mailto:veritas-bu-bounces AT mailman.eng.auburn DOT edu] On Behalf Of Michaels, Keith R
Sent: Tuesday, April 08, 2008 11:06 PM
To: veritas-bu AT mailman.eng.auburn DOT edu
Subject: Re: [Veritas-bu] Measuring redundant backup data

It should be possible to go through the catalog and determine how much redundancy is present based on the schedules and retentions. For example if the schedule calls for monthly fulls and the same file appears in 12 consecutive full backups (without appearing in any intervening incremental) then that's 10 unnecessary copies, assuming 2 are needed for adequate protection. I know there's additional duplication if the same file exists on two clients but that's harder to measure without comparing the data. I'm just interested in measuring the unnecessary copies that were created as a result of multiple backups of the same data.

From: Ed Wilts [mailto:ewilts AT ewilts DOT org]
Sent: Tuesday, April 08, 2008 7:05 PM
To: Jeff Lightner
Cc: Michaels, Keith R; veritas-bu AT mailman.eng.auburn DOT edu
Subject: Re: [Veritas-bu] Measuring redundant backup data

On Tue, Apr 8, 2008 at 5:57 PM, Jeff Lightner <jlightner AT water DOT com> wrote:

I don't know a way to measure how much is "redundant" easily. Maybe the
much vaunted Aptare would have that - I'll wait for their fan club to
comment on that. :-)

Not a chance - Aptare just gets job status and doesn't ever see the backup data.

So far it appears to us the deduplication devices are living up to or
exceeding expectations.

That's purely site specific. With PureDisk backing up our remote sites, I think we're under 5:1 but we're still building up the generation count. When we pointed some of larger main campus data at it, it wasn't even that high - nowhere near high enough to justify the cost.

Some vendors will let you eval you a unit - that's the only way to know how well you're going to dedupe because it is so client specific. If you have a ton of application servers with mostly OS and little application, you're going to de-dupe extremely well. If you have 1 file server full of TIFF data that never stays around very long, you won't de-dupe well at all.

--
Ed Wilts, Mounds View, MN, USA
mailto:ewilts AT ewilts DOT org

----------------------------------
CONFIDENTIALITY NOTICE: This e-mail
may contain privileged or confidential information and is for the sole use of
the intended recipient(s). If you are not the intended recipient, any
disclosure, copying, distribution, or use of the contents of this information is
prohibited and may be unlawful. If you have received this electronic
transmission in error, please reply immediately to the sender that you have
received the message in error, and delete it. Thank
you.
----------------------------------

_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu