Networker

Re: [Networker] VTL or disk cabinet backup

2007-11-05 02:36:21
Subject: Re: [Networker] VTL or disk cabinet backup
From: Oscar Olsson <spam1 AT QBRANCH DOT SE>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Mon, 5 Nov 2007 08:34:04 +0100
First of all, great email. But I do have some comments about parts of the 
email, which I will quote and comment below.

On 2007-11-04 23:23, Curtis Preston revealed:

CP> First let me say that for any environment backing up multiple terabytes,
CP> I believe that using a non-dedupe disk device as your primary storage
CP> for backups is at this point a waste of money.  Dedupe brings too much
CP> value to not use it in a disk target that's going to be used for
CP> anything other than staging.  (Note: I'm not talking about disk staging
CP> here.  I'm talking about using disk as your primary onsite storage
CP> device.)

Or do they..? I haven't been using any VTL/dedupe devices, so my argument 
is at a theoretical level, but..

The pricing I've seen for data dedupe devices isn't very attractive 
compared to the actual technical methods used. If we look at Avamar, which 
does this in software, the pricing is mentioned to be $17,000 per TB. This 
makes me think that the appeal of data dedupe has just faded. I can buy 
disk, that is shared as a LUN over FC for $1875 per TB, with decent 
hardware in a RAID-6 configuration. And this price tag keeps getting 
lower every day.

Also, from a personal standpoint, I think that native data deduplication 
will appear in the upcomming years in the underlying file systems. For 
instance, it wouldn't be so hard to include a compression module in ZFS 
that uses a data dedupe algorithm instead of regular gzip compression. 
Also, its sad that a fairly simple software solution can be overpriced in 
the way it is. Hopefully, the wave of court cases in the US will decrease 
as people figure out that patents for software is a bad idea.

CP> I'm assuming, then, you're going to be buying a dedupe device.
CP> 
CP> >* VTLs can support data deduplication
CP> > - And so can raw disk with additional hardware.
CP> 
CP> Today (and for the foreseeable future) there are only two types of
CP> dedupe devices: VTL and NAS heads.  That means that if you want a block
CP> device that you can communicate to without using IP, then you're going
CP> to want VTL.  If you're OK with IP/NAS-level performance, then either
CP> NAS or VTL can satisfy your performance requirements.  Current data
CP> suggests that a dedupe VTL and dedupe NAS head add about the same cost
CP> to the disk, so then the question is: which brings more value?

OK, I assume that you're talking about Data Domain here? I assume its this 
way, since the Data Domain isn't smart enough to understand the underlying 
file system, so instead they need to share a file system that they control 
over IP instead. I'm guessing that a stream to a tape device is easier to 
handle (no file system!), but then again, is multiplexing supported to a 
single virtual tape device?

The method of using many virtual tape devices, and limiting parallellism 
to 1 on each device seems like a hack to me. You will have many LUNs to 
keep track of, and the number of devices on each storage node is also 
limited, thus limiting even this kind of parallelism.

So, all in all, IMHO data deduplication appliances and software are still 
very expensive and not very elegaint either to make it worthwhile. If I 
can get +2X compression for free, disk for $1800 per TB, and get the 
read/write benefit, without emulatiing small volumes, why would I pay the 
current price tag for VTL? I'm still not convinced that TCO is lower for 
VTL solutions, considering both the price tag and the proprietary 
software/hardware used. I guess the complexity and knowledge about 
additional vendor-specific hardware also adds to the cost.

CP> >However, data dedup loses 
CP> >its appeal as it can't handle several save streams to one device at
CP> once.
CP> 
CP> I think you're saying that dedupe devices can't support multiplexed
CP> backups and maintain their dedupe ratio.  This is NOT true of all dedupe
CP> devices; it is true of some of them.  (Ask your vendor.)  In addition,
CP> I'd ask "why are you multiplexing to a disk device?"  There's no reason
CP> to do that in a NetWorker environment.  If you want to do 40 jobs
CP> simultaneously, create 40 virtual tape drives, not 10 virtual tape
CP> drives with 4 jobs each.

Because disk IO capacity > a single data stream IO (from most clients).

CP> I would totally agree with you here.  FWIW, other backup products are
CP> working with some VTLs & NAS products to figure out how to do this
CP> without the drawbacks you mentioned.  (Already GA in NetBackup.)

I think the best solution from a networker standpoint, would be to use 
LUNs as raw disk devices. Then these volumes could be mounted on any 
storage node, as long as storage nodes have access to the underlying LUN. 
Since networker uses direct block-level access, networker could support 
parallel read and write, just as regular AFTD devices, but without the 
limitation of needing a file system. Sure, the benefit of moving LUNs 
automatically would mean that a device needs to be fault tolerant before 
its presented to the OS, which means that using a JBOD with RAID-Z2 (ZFS 
again :P ) becomes impossible. But it would be nice to have as an option. 
Data dedupe should be doable in both scenarios.


CP> Standard disk is MUCH harder to use in large, multi-backup-server
CP> environments because of all the provisioning issues.  You have to create
CP> and manage one or more RAID volumes per backup server/storage node, etc.
CP> You're always going to have volumes that are too big or too small,
CP> creating something else to manage.  You can't easily move backups from
CP> one backup server/storage node to another.  If you have different Oss,
CP> you can't even mount the RAID group/volume you made on OS to another
CP> OS's backup server.  You get no dedupe or hardware compression.  You
CP> will have fragmentation issues if you use them as a permanent storage
CP> device (as opposed to disk staging.)

If you have several disk devices in one pool, then I guess networker would 
choose a non-full volume, right? 

CP> NAS disk (dedupe or otherwise) doesn't meet the needs of very large
CP> environments either, as many of the servers that need to be backed up
CP> need LAN-free backups.  LAN-free backups mean using a block device, and
CP> today a block device means either standard disk or dedupe VTL.  I've
CP> already said what I thought about standard disk, so that leaves only
CP> VTL.

How much I/O can a data domain box handle?

CP> VTLs can easily be shared between multiple backup servers, storage nodes
CP> -- even applications that don't share -- without creating and managing
CP> individual RAID volumes for each server.  They have dedupe and hardware
CP> compression, and any good dedupe device has worked out the fragmentation
CP> issue as well.

And why is handling RAID volumes a high cost?

In general, I feel that VTLs do have a point, although I'm still not 100% 
convinced regarding the price/performance ratio from a TCO perspective.

//Oscar

To sign off this list, send email to listserv AT listserv.temple DOT edu and 
type "signoff networker" in the body of the email. Please write to 
networker-request AT listserv.temple DOT edu if you have any problems with this 
list. You can access the archives at 
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER