Networker

Re: [Networker] Chronic NDMP backup problems

2004-01-06 14:04:25
Subject: Re: [Networker] Chronic NDMP backup problems
From: "Lemons, Terry" <lemons_terry AT EMC DOT COM>
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Date: Tue, 6 Jan 2004 14:04:08 -0500
Hi Matthew

Backups that don't stream tape drives continue to be a problem for all of
us, in that tape drives just weren't designed for data streams that
stop-and-start.  Disk drives, of course, were, which is why you might want
to consider a disk-based backup solution.

A major problem in this idea is that disk-based NDMP backup solutions
currently do not exist (if I'm wrong, I'd be glad to be corrected; last I
looked, the NDMP standard didn't support this).  So, traditional
backup-to-disk techniques won't work.

But, a virtual tape solution could be just what you want: you'd be writing
to disk, and so would have the high reliability and immunity to
stop-and-start data streams, but your backup software would still see the
familiar tape interface.  Plus, with some virtual tape library (VTL)
solutions, you'll also get RAID protection for your data, as well as a
method to help with creating physical tape media for your backups that
require offsite or long-term storage.

tl


-----Original Message-----
From: Matthew Huff [mailto:mhuff AT OX DOT COM]
Sent: Tuesday, January 06, 2004 1:20 PM
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Subject: [Networker] Chronic NDMP backup problems

We are having a chronic problem with reliable backups of our NetApp NAS
filers using Legato. Since NDMP uses the integrated dump within the
NetApp, if there are any unrecoverable errors on the NAS attached tape
drives, the entire dump is aborted. Since we have close to a Terrabyte
on each of our Netapps, we seem to be having continual problems getting
a level 0 (full) backup. Many times it will get 90% through and then
fail randomly on a SCSI tape write error.

We have contact our tape drive vendor (Exabyte) and performed
diagnostics. Previous support cases have shown that the drives had
errors on the tape heads. Due to the way dump is performed (filesystem
inode walk) at the beginning, there is a lot of "shoe-shining" of the
tapes. This is the supposed cause of the high failure rate that we have
seen. We have already completely replaced all 4 of our tape drives.

The issues is how do we solve the bigger problem, that if any media or
drive has a failure, the whole backup is aborted. We are not able to
break up the saveset to smaller volumes as this would require major
rewrites to our large application base. Our current thought is to
replace the Exabyte tape libraries with StorageTek LTO-2 based hardware.
Given the higher capacity, higher performance and reduced susceptibility
to the negative results of "shoe-shining", we should have better
results. However, we aren't sure especially since we would have to spend
a considerable amount of cash to determine if this would reduce the
problem.

I'm hoping for any feedback that perhaps other users have found as a
solution.


Equipment
=========
Backup Software: Legato Networker v6.1.3
Backup Server: Sun E450, 4xCPU, 4GB RAM
Backup Server OS: Solaris 9

NAS FileServer: NetApp F820
NAS FileServer OS: ONTAP 6.3.3
Tape Unit Attached to NAS: Exabyte 215 Mammoth 2 Autoloader (15
Cartridge, 2 Mammoth 2 Drives)

----
Matthew Huff           | One Manhattanville Rd
Director of Operations | Purchase, NY 10577
OTA LLC                | Phone: 914-460-4039
mailto:mhuff AT ox DOT com    | Fax:   914-460-4139

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=