[Veritas-bu] Destaging going slow

Hey all, 

I'm working as a collegue of Bart in troubleshooting the issue that we're 
having. First off let me start by saying thanks for the reply's so far. However 
I would like to add another observation.
I note that people who are claiming that they have good speeds are facing a 
100MB/s destaging speed. (including Mark phillips ie 300gb/h = 85MB/s). We are 
getting those speeds aswell. In fact on a freshly installed solaris we are 
reaching speeds of 107MB/s max depending on what we are destaging it seems.

However when we take a backup directly from our staging disk to tape, we get 
speeds of 210MB/s. So this means that we are having a performance drop of 50% 
to 60% when comparing both speeds. I notice this degradation on both wintel 
(ntfs) as on solaris/sparc (zfs) platform.
I can not seem to explain that difference in performance unless being caused by 
netbackup itself. However even after opening a support call with symantec and 
playing around with buffer and fragment sizes for 2 weeks now we are still far 
away from closing that speed gab.

So are there people that are getting the same speeds when comparing destaging 
speed to a back up taken directly to tape from the disk used for staging ? Or 
do you just have to live with the performance hit when destaging ? 



Martin, Jonathan wrote:
> I'm with Mark Phillips on this one.  We use the direct attached storage on 
> our media servers because I can get 3x the storage for the price of SAN 
> shelf. I just deployed a new Dell R710 Master / Media with 2 x Dell MD1220s 
> for less than $30,000. (Each MD1220 is driving an LTO3 drive 100+MB/sec as I 
> write this.) I think the last SAN shelf we purchased for our Hitachi was 
> $20,000. The good news about running a SAN, is that you're likely to have 
> more disk metrics available (from the SAN, not windows) to troubleshoot your 
> issues. We run Hitachi ourselves, but I'm not familiar with whatever 
> modifications HP makes.
> 
> Generally speaking I do not change the default NetBackup settings unless I'm 
> having a performance issue. The Dell R710 I deployed last week (6.5.6 
> Master/Media on Windows 2003 Std R2 x86) is stock / out of the box with zero 
> buffer / touch files configured. It drives LTO3 just fine.
> 
> My two largest Media/Masters (not the one above) only have the following 
> touch files, but they are the exception.
> NUMBER_DATA_BUFFERS           64
> NUMBER_DATA_BUFFERS_DISK      64
> 
> From a storage perspective, I've got all disks in a Dell MD1000 enclosure 
> configured in a single 15 disk RAID-5. My raid stripe size is generally 64K, 
> although I've played with 128K. My read policy is set to Adaptive Read Ahead, 
> my write policy is Write Back and my disk cache policy is disabled.
> 
> From a windows perspective, my disks are GPT formatted with 64K block sizes 
> (matches the stripe size above.) You may want to consider partition alignment 
> based on your SAN manufacturer's specifications. 64K is the most common in my 
> experience, but Microsoft also recommends 1024K offset, which accounts for 
> 32K, 64K, 256K and 512K offset requirements.
> 
> The SAN is going to add a layer of complexity to this.  Whoever manages you 
> SAN will create Raid Groups, then assign you luns from those raid groups. 
> Much like a normal Raid array, performance is "capped" at the raid group. The 
> difference between a raid array and a raid group is that your SAN guy can 
> carve up that Raid Group and assign it to 4 different servers, essentially 
> spreading your performance around. If you are using SATA disks you definitely 
> want a single server with a single lun on a single raid group, or performance 
> will suffer. You might also have the SAN guy disable options like LUSE luns.
> 
> To troubleshoot, fire up perfmon and add the Physical Disk \ Avg. Disk 
> Sec/Read counter for the specific disk you are testing. If you are seeing 
> large spikes > .050 (50ms) then you are seeing your SATA disks struggle to 
> find and return the data you are looking for. (for reference, my 100MB/sec 
> SATA arrays show <10ms spikes, my 35MB/sec SATA arrays have spikes > 100ms.) 
> You can also look at the individual disk queues if you have access to the 
> SAN's metrics. If you are testing single stream write and single stream read 
> to tape, then I am guessing that SAN congestion is your bottle neck.
> 
> Good luck!
> 
> -Jonathan
> 


+----------------------------------------------------------------------
|This was sent by gert.h.maes AT gmail DOT com via Backup Central.
|Forward SPAM to abuse AT backupcentral DOT com.
+----------------------------------------------------------------------


_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu