I've been using
DataDomain restorers, the 460 and 565 series for almost three years now and
here's my opinion of them, good and bad.
Good:
DeDuplication -
DataDomain's de-duplication claims are accurate
and the deduplication performance is impressive. I'm seeing compression ratios
of 4:1 (ORACLE policy type backups) to 17.1 (general filesystem backups of
systems with both STANDARD and MS-WINDOWS-NT policy types. For hot catalog
backups I'm seeing a compression ratio of 77:1.
Replication -
Replication also works well, which makes implementing a DR plan for your backup
system much easier. I write hot catalog backups and my DR info to a DSU on
my primary restorer once a day. Implementing a DR plan for NetBackup
becomes a lot easier with this kind of technology because it takes care of
replicating all of your backup data and your catalogs and DR info to your remote
site.
Performance - Backup
performance is very good and restores are wicked fast. Before I got my restorers
set up I was running a StorageTek L180 library with eight LTO2 tape drives 24x7.
I ran my primary backups and then duplicated them to tape and kept the
duplicated copies onsite with a three week retention period. I was very close to
running out of slots in the library for backup tapes and onsite duplication
and I was getting to a point where having even one drive go down was seriously
impacting my schedule. Installing two 460 series restorers, with two at the DR
site for replication solved this problem. I write primary backups to the
restorer with a one month retention period and then duplicate them to
tape with a longer retention period for offsite vaulting. There
are still performance issues related to the number of streams that the restorer
can handle, if you have streams open for reading data from the restorer for
duplication to tape then you can't have as many write streams open for backups,
but these are minor. The most I've had to do is stop a running duplication
job during the backup window and then let duplication catch up once the backups
are done. Being able to restore from a DSU with a long retention period is
awesome, it's like having NetApp snapshot restores for all of your data. The
Oracle and Exchange administrators I work with love this. Installing the two
DataDomain restorers allowed me to hold off on upgrading my tape library for
eighteen months.
Field engineering
support - The DataDomain field engineers I have worked with are knowledgable,
efficient and friendly. They reallly know the equipment and know the ins and
outs of NetBackup and how best to configure it for use with the DataDomain
equipment.DataDomain contracts out their routine technical support to Glasshouse
Technologies, who have been OK so far.
Ease of installation
and configuration - Configuring a restorer takes about 15 minutes. There is a
menu driven configuration utility at the CLI that runs you through all of the
steps and once that's done you mount the restorer filesystems as NFS volumes or
CIFS shares on your NetBackup master or media server, configure these
filesystems as disk storage units and start using the system. I have not used
the Open Storage Option yet but am looking forward to it. Really the hardest
part about configuring a restorer is getting it into a rack.
User Interface - The
GUI is very good and the CLI is superb. You can have multiple CLI sessions via
ssh and the CLI supports tab completion, command line history and if you enter a
command without any arguments will tell you what the possible arguments are.
There's a CS term to describe this but I don't know what it is, but as an
example if you want to see all of the arguments to the command "replication" You
type "replication" at the command line. One of the arguments for replication is
"show". If you want to see all of the arguments for "replication show" you type
"replication show" at the command line and it shows you "replication show
history", "replication show config", "replication show performance" and
"replication show stats". I'm a CLI guy and I love that I can quickly check on
the status of the system by connecting to it with ssh and running a handful of
commands instead of having to, as you do with a NetApp filer, connect with a web
interface and put up with a GUI because the CLI is crippled. The online
documentation in the CLI is also superb with the help system showing good and
relevant examples for each command. I've rarely had to RTFM with my
restorers.
Bad:
FLAMING RESTORERS OF
DEATH! - Last year we had one of our DDR460 restorers catch fire. Well, actually
it didn't catch fire, according to the DataDomain tech support people the
restorers are built from UL listed fire resistant materials, so what actually
happened is that the system midplane that the disk drives are connected to
developed a short circuit, heated to 950 degrees Celsius and melted. I found out
about this when I didn't get my morning status e-mail from the restorer in
question. I tried pinging it and getting on the console (it was the system at
our DR site) and while I was doing so my boss called and asked if I'd
checked the equipment at the DR site because he'd gotten a call from the folks
who manage it who said that the machine room was smoky and that it smelled like
a piece of electrical equipment had caught fire. It turned out that we were the
culprits and that it was one of our 460 series restorers had melted down. That
afternoon I got an e-mail from DataDomain with a technical support bulletin that
said "Oh by the way, if you have a 460 series restorer and the serial
number on the midplane is such and such please contact us so we
can schedule an engineer to come out and replace it because there's a
minor risk that the system could short circuit and let the magic smoke
out." DataDomain did replace the restorer and I was lucky as it was the
replication target at our DR site and not the primary that I stored my backup
images on, but I was really nervous until all of the system midplanes had been
replaced on our DDR460 restorers. Apparently this replacement wasn't enough, or
DataDomain wasn't comfortable with it as they issued another support bulletin
for the 460 series restorers and we had to have all of the midplane boards
replaced again this year. I've been a systems administrator for 20 years and
worked in a variety of environments with a whole bunch of different gear and
this was the first time that I'd ever had the magic smoke escape from a piece of
equipment, and let me tell you, that sucker was melted. The damage was contained
inside of the case but there was no way we could have salvaged any of the disks
even if we had wanted to. Again I'm glad that it was the one at the DR site,
which only contained replicated backup images and not one of my primary
restorers.
Flaky code - DDOS,
the Linux based operating system that the restorers run is sitll very much a
work in progress. DataDomain releases a major upgrade containing bug fixes for
DDOS about every three months. You're pretty much stuck with installing
these upgrades as they often contain code fixes necessary to support new SATA
disk drive firmware revisions. The upgrade process is quick and easy enough
but it's still a PITA because if I ugprade a piece of equipment in the
backup system I need to test and document restoring backup images from
before the upgrade and backups and restores after the upgrade (and I would do
this even if it weren't part of the SOP for my backup system. It's not that I'm
paranoid, I'm just that I firmly believe in Murphy's law).
The 460 and 565
series restorers use consumer grade Hitachi or Seagate SATA drives, no different
than what you would purchase from Fry's. A few weeks back I had a drive fail on
my 565 series restorer. DataDomain spotted the failed drive in the daily
autosupport and sent me a new drive without me having to do anything. The new
drive didn't work, it was the same part number and firmware revision as the
drive it replaced but it was from Seagate's Thailand facility, which has been
notorious as of late for shipping batches of bad drives. So I requested
another drive. The new drive came in, it was a Seagate, with a different
firmware revision and date code. I installed it and still didn't get any love
from the system. I called up DataDomain and said "what's up" and they told me
that they had discovered a bug in DDOS that prevented a failed drive from
being replaced if you had the letters "DDR" anywhere in the hostname of a
DataDomain restorer. My restorer hostnames all begin with "DDR" (What
was I thinking?) So in order to replace this drive I had to temporarily
change the hostname with the command "net set hostname". Type "yes" when the
system said "Hey, changing your hostname affects replication at the source and
target and will require the use of the 'repl modify' command". Unfail the drive
with the 'disk unfail', command and then change the hostname back to what it was
originally. Right after I got done with that I received an alert from the
restorer saying that the drive wasn't a qualified drive model. The drive's part
number is the same as the other drives, but the firmware revision is newer. Fun
times.
I have had problems
similar to the one above with my restorers since the day I first powered one on.
One of my restorers will mark a drive as failed and half of the time when I call
it in I get some bored tech who tells me to remove the drive and then reseat it
and see if it still shows up as failed. I shut one of my systems down last month
(so I could have the midplane board replaced) and when it came back up it showed
as having two failed drives. I called DataDomain and they told me to power the
system off, reseat the drives and then power the system back on and use the
"disk unfail" command to unfail the disks. This is complete and total BS and it
angers me every time they tell me this. If I have a bad drive on a NetApp filer
or a Sun/STK RAID and call tech support for either NetApp or Sun they
don't tell me to reseat the drive, cycle the power, dance the hokey pokey
or put on my ruby slippers, click my heels together three times and type "disk
unfail, disk unfail, disk unfail" at the CLI, they just replace the bloody
drive, no questions asked. I have complained to DataDomain about this every time
it happens and have been told that the next release will fix the problems
with checking drive status, really, it will, and the check is in the mail and
DataDomain will respect me in the morning too.
I have to say that
am completely and totally gobsmacked by this latest bug. I cannot imagine any
reason why the system hostname should in any way, shape or form have anything to
do with the code for checking, changing and controlling drive status in the
RAID. I consider the bugs in DataDomain's disk status monitoring to be
a huge problem with their equipment and they give me pause and make me nervous
about the data I have stored on these systems. While DataDomain's restorers are
one of the less expensive de-duplication solutions on the market the fact
remains that they're still expensive. DataDomain is claiming that they have an
enterprise grade solution, and they certainly have an enterprise grade price,
but this kind of thing is not enterprise grade reliability.
I hope this
helps.
Jamie
Jamison
Network Systems
Administrator
ZymoGenetics,
Seattle
_______________________________________________
Veritas-bu maillist - Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
|