Hey all,
I'm working as a collegue of Bart in troubleshooting the issue that we're
having. First off let me start by saying thanks for the reply's so far. However
I would like to add another observation.
I note that people who are claiming that they have good speeds are facing a
100MB/s destaging speed. (including Mark phillips ie 300gb/h = 85MB/s). We are
getting those speeds aswell. In fact on a freshly installed solaris we are
reaching speeds of 107MB/s max depending on what we are destaging it seems.
However when we take a backup directly from our staging disk to tape, we get
speeds of 210MB/s. So this means that we are having a performance drop of 50%
to 60% when comparing both speeds. I notice this degradation on both wintel
(ntfs) as on solaris/sparc (zfs) platform.
I can not seem to explain that difference in performance unless being caused by
netbackup itself. However even after opening a support call with symantec and
playing around with buffer and fragment sizes for 2 weeks now we are still far
away from closing that speed gab.
So are there people that are getting the same speeds when comparing destaging
speed to a back up taken directly to tape from the disk used for staging ? Or
do you just have to live with the performance hit when destaging ?
Martin, Jonathan wrote:
> I'm with Mark Phillips on this one. We use the direct attached storage on
> our media servers because I can get 3x the storage for the price of SAN
> shelf. I just deployed a new Dell R710 Master / Media with 2 x Dell MD1220s
> for less than $30,000. (Each MD1220 is driving an LTO3 drive 100+MB/sec as I
> write this.) I think the last SAN shelf we purchased for our Hitachi was
> $20,000. The good news about running a SAN, is that you're likely to have
> more disk metrics available (from the SAN, not windows) to troubleshoot your
> issues. We run Hitachi ourselves, but I'm not familiar with whatever
> modifications HP makes.
>
> Generally speaking I do not change the default NetBackup settings unless I'm
> having a performance issue. The Dell R710 I deployed last week (6.5.6
> Master/Media on Windows 2003 Std R2 x86) is stock / out of the box with zero
> buffer / touch files configured. It drives LTO3 just fine.
>
> My two largest Media/Masters (not the one above) only have the following
> touch files, but they are the exception.
> NUMBER_DATA_BUFFERS 64
> NUMBER_DATA_BUFFERS_DISK 64
>
> From a storage perspective, I've got all disks in a Dell MD1000 enclosure
> configured in a single 15 disk RAID-5. My raid stripe size is generally 64K,
> although I've played with 128K. My read policy is set to Adaptive Read Ahead,
> my write policy is Write Back and my disk cache policy is disabled.
>
> From a windows perspective, my disks are GPT formatted with 64K block sizes
> (matches the stripe size above.) You may want to consider partition alignment
> based on your SAN manufacturer's specifications. 64K is the most common in my
> experience, but Microsoft also recommends 1024K offset, which accounts for
> 32K, 64K, 256K and 512K offset requirements.
>
> The SAN is going to add a layer of complexity to this. Whoever manages you
> SAN will create Raid Groups, then assign you luns from those raid groups.
> Much like a normal Raid array, performance is "capped" at the raid group. The
> difference between a raid array and a raid group is that your SAN guy can
> carve up that Raid Group and assign it to 4 different servers, essentially
> spreading your performance around. If you are using SATA disks you definitely
> want a single server with a single lun on a single raid group, or performance
> will suffer. You might also have the SAN guy disable options like LUSE luns.
>
> To troubleshoot, fire up perfmon and add the Physical Disk \ Avg. Disk
> Sec/Read counter for the specific disk you are testing. If you are seeing
> large spikes > .050 (50ms) then you are seeing your SATA disks struggle to
> find and return the data you are looking for. (for reference, my 100MB/sec
> SATA arrays show <10ms spikes, my 35MB/sec SATA arrays have spikes > 100ms.)
> You can also look at the individual disk queues if you have access to the
> SAN's metrics. If you are testing single stream write and single stream read
> to tape, then I am guessing that SAN congestion is your bottle neck.
>
> Good luck!
>
> -Jonathan
>
+----------------------------------------------------------------------
|This was sent by gert.h.maes AT gmail DOT com via Backup Central.
|Forward SPAM to abuse AT backupcentral DOT com.
+----------------------------------------------------------------------
_______________________________________________
Veritas-bu maillist - Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
|