Veritas-bu

[Veritas-bu] Fast backup to tape but slow backup to disk on NBU 5.1MP3

2005-08-15 22:48:10
Subject: [Veritas-bu] Fast backup to tape but slow backup to disk on NBU 5.1MP3
From: tim.berger AT gmail DOT com (Tim Berger)
Date: Mon, 15 Aug 2005 19:48:10 -0700
Thanks for your thoughts on dealing with a failed dssu.  I can't say
that would add to the pleasure of managing netbackup.  I have 3
staging servers planned.  1 master, 2 media.  Each with 24 400GB SATA
drives.  Out of those 72, I fully expect at least one failure every 3
months or so.

Regarding read performance, I've learned that by cranking up the
readahead, I now see very high sequential read rates (244 MB/sec) from
a 5 disk raid5. This is with the el4 (centos, actually) distribution
running the fc3 2.6.12 kernel.  This ought to feed LTO3 just fine. ;-)

Eg. : blockdev --setra 16384 /dev/sdb

3ware has this documented here:

http://www.3ware.com/KB/article.aspx?id=11050

I've given up on the el4 kernel.  It's absolutely horrible at I/O. 
Without any readahead tuning, it was only getting just above
100MB/sec.

On 8/15/05, home Rob Worman <rob AT worman DOT org> wrote:
> What a fascinating thread - thanks for the interesting test results,
> Tim!
> 
> As for "what the netbackup fallout might be on a dssu loss", here's my
> crack at it:
> 
> Losing a disk storage unit is equivalent to a destroyed tape, just on a
> much larger scale :-(
> 
> Depending on your configuration and the state of the disk storage unit
> at the time of the loss, there are two potential scenarios that you'll
> be in:
> 
> For a given image...
> (1)the lost disk storage unit had the only copy of the image
>   or
> (2)the lost disk storage unit had the original copy of the image, but
> there are additional copies that also exist.
> 
> The immediate symptom of either of these scenarios would look
> similar... you would be able to browse these images for a restore, but
> once the restore is launched you're going to see lots of errors.
> (probably status code 174 or 85 failures)
> 
> The immediate resolution of either of these would also be similar -
> expire the invalid copy of the data from the catalog with bpexpdate
> -backupid -copy.
> 
> The difference between (1) and (2) lies in the end result of that
> bpexpdate.  You'll either lose the entire image because you had to
> bpexpdate your only copy from the catalog, or else your secondary copy
> will automatically ascend to primary and your restore attempts will
> work again.
> 
> HTH
> rob
> 
> 
> On Aug 11, 2005, at 6:25 PM, Tim Berger wrote:
> 
> > Matt, writing multiple concurrent streams to the same set of disks may
> > be hurting performance.  One at a time may yield better results.
> >
> > I'm in the process of building out some staging servers myself for nbu
> > 5.1 - been doing a bunch of bonnie++ benchmarks with various configs
> > for Linux using a sata 3ware controller.
> >
> > On fedora core 3 (I know it's not supported):
> >
> > Raid5, 5 disks I got ~30MB/sec writes & 187MB/sec reads.  Raid 50 with
> > striping over 3 4-disk raid5's got 49MB/sec writes, 120 MB/sec reads.
> > For raid0, w/10 disks, got a nice 158 MB/sec writes, and 190MB/sec
> > reads.
> >
> > I'm partial to raid5 for high availability even with poor write
> > performance..  I need to stream to lto3, which tops out at 180 MB/sec.
> > If I went with raid0 and lost a disk, then a media server would take a
> > dive, backups would fail, and I'd have to figure out what data failed
> > to make it off to tape.  I'm not sure how I'd reconcile a lost dssu
> > with netbackup.  If I wanted to to use the dssu's for doing synthetic
> > fulls, then that further complicates things if a staging unit is lost.
> >
> > Any thoughts on what the netbackup fallout might be on a dssu loss?
> >
> > Even though it's not supported yet, I was thinking of trying out
> > redhat enterprise linux 4, but I'm seeing really horrible disk
> > performance (eg. 100MB/sec reads for raid5 vs the 187MB/sec on fc3).
> >
> > Maybe I should try out the supported rhel3 distribution. ;-)  I
> > don't have high hopes of that improving performance at the moment.
> >
> > On 8/10/05, Ed Wilts <ewilts AT ewilts DOT org> wrote:
> >> On Wed, Aug 10, 2005 at 12:43:39PM -0400, Matt Clausen wrote:
> >>> Yet when I do a backup to disk, I see decent performance
> >>> on one stream (about 8,000KB/s or so) but the other streams will
> >>> drop to
> >>> around 300-500KB/s.
> >>>
> >>> NUMBER_DATA_BUFFERS = 16
> >>> NUMBER_DATA_BUFFERS_DISK = 16
> >>>
> >>> SIZE_DATA_BUFFERS = 262144
> >>> SIZE_DATA_BUFFERS_DISK = 1048576
> >>>
> >>> and I see this performance on both the master server disk pool AND a
> >>> media server disk pool. The master server is a VxVM concat volume
> >>> set of
> >>> 3x73GB 10,000RPM disks and the media server is an external raid 5
> >>> volume
> >>> of 16x250GB SATA disks.
> >>
> >> I don't believe you're going to get good performance on a 16 member
> >> RAID5 set of SATA disk.  You should get better with a pair of 8 member
> >> raid sets, but SATA is not fast disk and large raid 5 sets kill you on
> >> write performance.  If you're stuck with the SATA drives, configure
> >> them
> >> as 3 4+1 RAID5 sets and use the 16th member as a hot spare.  You'll
> >> have
> >> 3TB of disk staging instead of about 3.8TB but it will perform a lot
> >> better.
> >>
> >> --
> >> Ed Wilts, Mounds View, MN, USA
> >> mailto:ewilts AT ewilts DOT org
> >> _______________________________________________
> >> Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
> >> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
> >>
> >
> >
> > --
> > -Tim
> >
> > _______________________________________________
> > Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
> > http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
> >
> 
> 


-- 
-Tim