Re: [Veritas-bu] Architectural question (staging)

Just gonna throw in my $.02...

We use DSSU’s in our primary datacenter, and have it setup on 12 x 9TB filesystems using VXFS (on Linux). Each filesystem is made from a 9+1 RAID group configured as a single LUN (1TB SATA drives). We try to keep at least 7 days worth of data on the filesystems.

We allow 40 concurrent jobs to each filesystem, and regularly see between 200-300MB/s write throughput to a single filesystem – the disks rarely reach even 90% busy. Using VXFS, we’ve never seen an issue with fragmentation – we get the same performance today as we did when they were implemented 2+ years ago, and we have never run a VXFS defrag.

For destaging, we use a single LTO3 or LTO4 tape drive for each filesystem. If the data is mostly large backup images, then the filesystems keep up with the tape drives without issue – filesystem reads will be around 100-200MB/s and the disks don’t exceed 60% utilization.

When the backup images are mostly small incrementals, the destaging performance is quite a bit lower – usually in the 50-80MB/s range, I’ve assumed it’s due to some sort of tape drive repositioning/backhitch/etc for each image, but I don’t really know. Since the total amount of data to be destaged is usually not very large, it doesn’t really matter to us, since it meets our windows for getting images destaged.

Worth noting, is that we also force disk I/O’s writes to be 256k, using SIZE_DATA_BUFFERS_DISK. I don’t recall what the default size is, but I do know that in benchmarking the system before deployment we found that this size provided noticeably better throughput. Also, when destaging I believe that the tape buffer size is forced to the same size that is configured for the disk (or maybe it’s the other way around, where disk read I/O’s get forced to the same size as the tape I/O’s)...Anyhow, if you’re seeing slow destaging performance, and the disks aren’t 100% busy, then be sure you look at the buffer sizes configured, and the size of the actual I/O’s being issued.

-devon

----------------------------------------------------------------------

Date: Thu, 6 May 2010 06:56:51 -0400

From: Travis Kelley <rhatguy AT gmail DOT com>

Subject: Re: [Veritas-bu] Architectural question (staging)

To: "Martin, Jonathan" <JMARTI05 AT intersil DOT com>, Victor Engle

<victor.engle AT gmail DOT com>, veritas-bu AT mailman.eng.auburn DOT edu

Message-ID:

Content-Type: text/plain; charset=ISO-8859-1

I agree with Martin here on them "working" in some cases. I have and

EMC Clariion with 45 1TB SATA disks and I can tell you it screams. I

routienly see over 600MB/S out of the array wjile destaging. Sure, I

have a larger and potentially "smarter" array than some but to say

they don't ever work is wrong.

One other point in regard to fragmentation. If you are truely using

the disks as a cache and aren't in need of the additional restore

performance they provide then as soon as destaging is done you can

just expire all of the images on disk. Once you have them on tape,

you may not "need" them on disk anymore anyway. If you are able to do

this somewhat regularly (as often as you determine is necessay to keep

performance up), fragmentation becomes a non-issue. In my case

fragmentation has never been an issue anyway, because of the extremely

wide striping. But if its an issue, as long as you can clean down

the disk every once in a while, the problem goes away.

Also images are interleved on the disk in the sense that they are not

contigious on the disk from a block perspective, but the image files

are not "multiplexed" as they would be on tape. Every backup image

has at least one file all its own.

Hope that help.

Travis

On 5/5/10, Martin, Jonathan <JMARTI05 AT intersil DOT com> wrote:

> I'd hate not to disagree with someone as grumpy and disagreeable as Ed.

> Personally, I wouldn't take advice on this matter from someone who

> "worked with disk staging units for at least a year" and "gave up."

> (Also, I think Ed is a wet blanket.) I had this thing figured out 4

> years ago when we first implemented DSSUs in production. I may not be

> the biggest NBU shop on the planet, but I back up more than 50TB a week

> using this method exclusively, so I can tell you that it does work.

> As far "interleaving", there is most certainly interleaving at the file

> system level when you run multiple streams to a DSSU. How Ed can say

> there is no interleaving and then tell you to watch your disk

> fragmentation is beyond me. Fragmentation = disk interleaving as far as

> I am concerned. The point is that the files are non-contiguous.

> Here's my proof.

> This is a snippit of a utility called DiskView from SysInternals /

> Microsoft. The yellow bits are the actual 1K fragments of data on disk

> for that image file above. The little red dots indicate the beginning

> and end of file fragments. There are 64 little yellow dots between the

> red dots indicating my 64K clusters.

> Here's that same section of disk, different image file. These two

> streams ran simultaneously last night (along with 6 others) and I can

> guarantee you that the top image wrote faster, and will destage to tape

> faster than the image below.

> Why? Imagine you are bpdupicate.exe requesting the first file back to

> write to tape. Compared to the 2nd image, you are going to get a lot

> more reading done and a lot less seeking as your head(s) cross the disk

> to pickup fragments. Or, so goes my theory. There is a utility

> available from Dell that will show me the amount of time spent reading /

> writing versus seeking per disk but I didn't have the time to acquire it

> and test.

> Now, I know there are variables here. As I stated before, one of the big

> improvements to my speed was using a 64K cluster size. Last time I

> checked this wasn't available in Unix/Linux. Then again, ext2/3 file

> systems also like to leave "space" between their writes to account for

> file growth, which may help (but I doubt it.) I intended to test this

> several years back, but my management put the kibosh on Linux media

> servers. The raid controller, simultaneous read/write, spindle count,

> and disk type also add a lot of variability.

> I haven't tested any of this on a SAN volume, only on direct attached. I

> don't think there is much to be gained by taking a 6TB lun and

> partitioning it at the OS or breaking it into multiple luns at the SAN.

> After partitioning, the entire DSSU is still on the same raid group /

> set, which ultimately controls your performance. If you could take your

> 6TB lun and break it into 3 x 2TB raid groups / luns then I think that

> would help. I've actually considered breaking my 14 disk RAID5s into 14

> single disks for performance testing (single stream each), but that's an

> entirely different management nightmare (14 DSSUs per media server

> etc...) A single SATA disk can drive LTO3, assuming the data is all

> nicely lined up. The minute that head has to go seeking, you are in a

> world of hurt.

> Again, I would start with a single stream to that 6TB DSSU and see what

> you get both writing to the DSSU and destaging to tape. Whatever

> performance you get out of that configuration is your best case

> scenario. Multiple streams or creating multiple partitions will only

> drag your numbers down. The crux of the issue (at least for me) is

> balancing the number of streams I need to run to get my backups to DSSU

> within my windows, versus the destaging speed I need to get that data

> off to tape on time.

> Good luck,

> -Jonathan

> From: veritas-bu-bounces AT mailman.eng.auburn DOT edu

> [mailto:veritas-bu-bounces AT mailman.eng.auburn DOT edu] On Behalf Of Ed Wilts

> Sent: Wednesday, May 05, 2010 4:06 PM

> To: Victor Engle

> Cc: veritas-bu AT mailman.eng.auburn DOT edu

> Subject: Re: [Veritas-bu] Architectural question (staging)

> On Wed, May 5, 2010 at 2:57 PM, Victor Engle <victor.engle AT gmail DOT com>

> wrote:

> So my question is how best to configure the DSSUs with the goal of

> optimized de-staging. I will have 6TB to configure as desired on the

> backup server. If I understand correctly, the more concurrent streams

> allowed to the DSSUs, the slower the de-staging because of interleaved

> backup streams.

> The DSSU consists of a set of files with each file being a backup image

> and you define the maximum size of each file within an image. There is

> no "interleaving". When you destage, one image at a time goes to tape.

> Watch your fragment sizes and watch your disk file system

> fragmentation...

> .../Ed

> Ed Wilts, RHCE, BCFP, BCSD, SCSP, SCSE

> ewilts AT ewilts DOT org

> Linkedin <http://www.linkedin.com/in/ewilts>

_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu