Veritas-bu

[Veritas-bu] Monitoring perfomance at the buffer level

2004-08-11 13:46:50
Subject: [Veritas-bu] Monitoring perfomance at the buffer level
From: ewilts AT ewilts DOT org (Ed Wilts)
Date: Wed, 11 Aug 2004 12:46:50 -0500
On Tue, Aug 10, 2004 at 03:59:35PM -0600, Mark.Donaldson AT cexp DOT com wrote:
> Here's a quick and dirty script that sweeps the bptm logs on a media server
> for a supplied policy name and reports the "fill_buffer, waiting on empty
> buffer" and "write_backup, waiting on full buffer" statistics.
> 
> Output looks like this:
> 
> >policy_perf Hot_PRD
> ## Gathering data..........Done.
> ## Write to buffer waiting on available buffer:
> Min: 0  Avg: 356  Max: 5877 with 285 samples
> 
> ## Write to tape waiting on full buffer:
> Min: 0  Avg: 43373  Max: 290583 with 7 samples

I've added a section to optionally pass in a date so I can go back
through previous days logs and here's a sample:

[root@osiris ewilts]# ./perf.sh osiris-vpn 081004
Using /usr/openv/netbackup/logs/bptm/log.081004
## Gathering data.................................................Done.
## Write to buffer waiting on available buffer:
Min: 0  Avg: 1216  Max: 42479 with 150 samples

## Write to tape waiting on full buffer:
Min: 317  Avg: 34382  Max: 157723 with 48 samples
 
> If the Write to Buffer is waiting for an available empty buffer a whole
> bunch, then perhaps you should increase your buffer count.  If you're tape
> writing process waiting on a full buffer a lot, then you're starving your
> tape drives and you should find a way to increase the delivery of client
> data to your media server or increase your multiplexing factor.

So what's a "whole bunch"?  Is what I'm seeing an issue I should deal
with?  Don't things like incrementals really slow down the tape
processing?

Can it be broken down by host instead of by policy?  Having multiple
hosts per policy would make it difficult to target a system to fix.
There's also the minor issue of not knowing which hosts or policies even
have buffer messages in bptm.  The script is an excellent start though.

My overall issue is that although we have GigE connections between many
hosts and the media servers, and trying to drive 8 SDLT220 drives in an
L700, we almost never exceed 11MBs of traffic coming into the media
servers. It's like there's a cap there that we just haven't been able to
remove. 

Thanks,
        .../Ed

-- 
Ed Wilts, Mounds View, MN, USA
mailto:ewilts AT ewilts DOT org