Re: [ADSM-L] Data Deduplication

2007-08-31 20:39:16
Subject: Re: [ADSM-L] Data Deduplication
From: Curtis Preston <cpreston AT GLASSHOUSE DOT COM>
Date: Fri, 31 Aug 2007 20:33:44 -0400
I thought we DID address that in one of the posts.  (Maybe I'm getting
things confused with another thread I'm having on the same topic.)  

A properly designed de-duplication backup system should restore the data
at the same speed as, if not faster than the backup, and the tests that
I've done with a few of them have all worked this way.  I believe it's
something you should test, but it appears that the designers thought of
this natural objection and designed around it.

I believe it has to do with the fact that restoring 100 random pieces to
create a single file means you get to read off of a bunch of spindles.

I will say that there are speed differences between the de-dupe
appliances (VTLs) and de-dupe backup software.  De-dupe backup software
still restores fast enough for what it was designed for.  (You should be
able to fill a GbE pipe with such a restore.)  But they're not going to
restore at the 100s of MB/s that you can get out of one of the

W. Curtis Preston
Backup Blog @
VP Data Protection, GlassHouse Technologies 

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
Richard Sims
Sent: Friday, August 31, 2007 3:13 PM
Subject: Re: [ADSM-L] Data Deduplication

On Aug 31, 2007, at 4:33 PM, Dave Mussulman wrote:

> ... Avamar said their software got
> 10-20% reduction on a backup of a stock Windows XP installation.  A
> single system, say it's the first one you added to your backup group.
> That's not two users with the same email attachments saved, or
> identical
> files across two systems - that's hashing files in the OS (I presume
> from headers in DLLs and such.) ...

I'm mildly amused that in all these postings on the subject, none has
addressed the corollary of the backups: restoral.  There are likely
some implications in the restoral of files backed up this way,
perhaps most particularly in system files; and restoral performance
is also something one would wonder about.  And there may be
situations where such a backup/restore regimen is to be avoided,
because of issues.  Perhaps those with experience in this area would
post what they've found.

    Richard Sims, at Boston University

<Prev in Thread] Current Thread [Next in Thread>