Networker

Re: [Networker] VTL or disk cabinet backup

2007-11-05 18:40:38
Subject: Re: [Networker] VTL or disk cabinet backup
From: Curtis Preston <cpreston AT GLASSHOUSE DOT COM>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Mon, 5 Nov 2007 18:35:01 -0500
Glad you liked my email.  Just trying to clear up the confusion.

OO>The pricing I've seen for data dedupe devices isn't very attractive 
OO>compared to the actual technical methods used. If we look at Avamar,
OO>which does this in software, the pricing is mentioned to be $17,000 
OO>per TB. 

You can't compare the pricing of Avamar, Puredisk, & Asigra (de-dupe
software products) with the pricing of de-dupe hardware (which is what
the question was about).  Avamar and the like are meant to replace (for
the clients that are backed up by them) EVERYTHING.  Instead of buying a
NetWorker server, NetWorker clients, library license, a tape library,
tapes, copying those tapes, and storing those tapes offsite, you buy
Avamar and replicate your backups.  It is a complete backup system that
can make onsite and offsite backups without creating a tape.  (You can
create tapes on the back end if you want using NetWorker or any other
backup software.)

OO>I can buy disk, that is shared as a LUN over FC for $1875 per TB

Dedupe disk ranges from 1/2-1/5 of the price, and that's using
higher-end disk than you're probably getting for $1875/TB.

OO>Also, from a personal standpoint, I think that native data
deduplication 
OO>will appear in the upcomming years in the underlying file systems.
For 
OO>instance, it wouldn't be so hard to include a compression module in
ZFS 
OO>that uses a data dedupe algorithm instead of regular gzip
compression. 

Deduping primary data doesn't make nearly as much sense as deduping
backups.  There's just not as much duplicated data.  NetApp has a
product that does this, but unless you're putting something like Vmware
images on it, your dedupe ratio is very small.

OO>Also, its sad that a fairly simple software solution can be
overpriced in 
OO>the way it is. Hopefully, the wave of court cases in the US will
decrease 
OO>as people figure out that patents for software is a bad idea.

Simple?  This isn't even CLOSE to simple.  Most of the vendors that
released dedupe products worked on them 3-5 years before releasing them.

OO>OK, I assume that you're talking about Data Domain here? 

Data domain has both a NAS and VTL offering, but I was not referring
directly to Data Domain in my response.  I'm referring to the entire
industry.  If you want a dedupe device, you have two choices: NAS or
VTL.  Data Domain happens to do both.

OO>Data Domain isn't smart enough to understand the 
OO>underlying file system, so instead they need to share a file system 
OO>that they control over IP instead.

Going back to the "this ain't easy," it's hard enough to do dedupe right
when you control the filesystem, I can't imagine how hard it would be if
you didn't control it.

OO>The method of using many virtual tape devices, and limiting
parallellism 
OO>to 1 on each device seems like a hack to me. You will have many LUNs
to 
OO>keep track of, and the number of devices on each storage node is also

OO>limited, thus limiting even this kind of parallelism.

I'd say it's a heck of a lot easier than managing
interleaving/multiplexing and all of the joys that go with that.

OO>If I can get +2X compression for free, disk for $1800 per TB

Where are you getting compression for free?

CP> I'd ask "why are you multiplexing to a disk device?"  There's no
reason

OO>Because disk IO capacity > a single data stream IO (from most
clients).

That doesn't mean you need to multiplex.  It just means you have to
handle multiple streams.

OO>I think the best solution from a networker standpoint, would be to
use 
OO>LUNs as raw disk devices. Then these volumes could be mounted on any 
OO>storage node, as long as storage nodes have access to the underlying
LUN. 

I would agree that this would be great, but now you're talking about the
future if we're talking NetWorker.  (FWIW, NetBackup has this today.)
I'm talking about right now.

OO>How much I/O can a data domain box handle?

I'm going to change the question to "how much I/O can dedupe devices
handle?"  That answer ranges from 100 MB/s to several thousand MB/s
depending on vendor.

OO>And why is handling RAID volumes a high cost?

It's the difficulty in creating MULTIPLE RAID volumes, deciding how big
each of them should be, and assigning them to servers.  You will always
have LUNs that are too big and/or too small, creating waste and
management issues, respectively.  You also (today) can't share those
volumes between multiple backup servers.  A VTL can be shared between
multiple servers.

To sign off this list, send email to listserv AT listserv.temple DOT edu and 
type "signoff networker" in the body of the email. Please write to 
networker-request AT listserv.temple DOT edu if you have any problems with this 
list. You can access the archives at 
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER