Veritas-bu

[Veritas-bu] "Waited for full buffer" messages

2002-07-17 17:32:17
Subject: [Veritas-bu] "Waited for full buffer" messages
From: larry.kingery AT veritas DOT com (Larry Kingery)
Date: Wed, 17 Jul 2002 17:32:17 -0400 (EDT)
You may want to refer to the Troubleshooting Guide, Appx A, "Backup
and Archive Processes" (there's a picture of the processes a couple
pages in).  Notice the shared memory between the bptm
processes.  This is where the data buffers are implemented.

The bptm child process will collect data and prepare it to be written
to tape, then write that data into a buffer.  The bptm parent then
reads from the buffer(s) and writes to tape.  The main reason for this
is that it allows you to be both reading and writing data at about the
same time.  It also provides some buffer space to compensate for
variable input or output speeds, nondeterministic process scheduling,
etc.  

Any inidividual buffer may be written to or read from, but not both,
at any point in time.  That's one reason why there's more than one
buffer (though the picture only shows one block of shared memory, it's
actually used as multiple data buffers) - so you can be reading and
writing at the same time.

The "waited for" messages in the bptm log simply tell us how many
times one of the bptm processes had to stop and wait for the other
side to catch up.  In your case, the parent had to wait a lot of times
for the child (which never really had to wait for the parent).  Since
the parent could empty the buffers much faster than the child could
fill them, it spent a lot of time "waiting for full buffers".  This
tells us that your tape drives could probably go faster if only the
input could keep up.

Now, a common misconception is that these messages mean that you need
to do something to the buffers.  This is simply not true.  These
messages tell us that one side is faster than the other - not why.
Adjusting the buffers *might* help, and might *not*.  For example, if
I have a 20MB/s tape drive trying to backup data that's coming across
a single 100Mb/s network, *nothing* I do to the buffers is going to
magically make the network capable of keeping up with the drive - I'm
going to see a lot of waits for full buffers.

That's not to say that one can't improve performance by adjusting
buffers.  But you can't look at the messages about waits - which only
point to which side of the process is faster than the other - and say
for every case that tuning buffers is what's going to improve
throughput.  (In your case, I doubt tuning buffers will do much, what
you need to figure out is how to get the data off disk or across the
net [or whatever] faster). 

Another common mistake people make is to start looking too closely at
these messages and lose sight of what the actual goal is.  The goal is
not necessarily to reduce the number of events, or to make the empty
waits equal to the full waits, or anything but to maximize the
performance.[1]

L

[1] Okay, one can come up with exceptions.  For example, if you were
happy with your performance you might wish to then reduce the number
of waits for full buffers to reduce the wear on your drives, IF you
could do it without impacting the overall objective of meeting your
business requirements.


Ty King writes:
> I've been making attempts to improve the throughput of my backups as I think
> there's definitely some room do to so considering our hardware
> configuration.  In the BPTM logs I'm looking at the "waited for full buffer"
> entries with numbers ranging from 40,000 for small jobs to 700,000+ times
> for a 400GB backup on our file server.  This seems like way too much.  I've
> compared it with the "waited for empty buffer" numbers which are generally
> below 100.  It seems to me there should be a better middle ground.
> 

[text deleted]
 
> I've been toying with the buffer size and number.  Most people seem to
> recommend using 64K buffers, but I've also tried reducing the buffer size as
> that seems to be the suggestion when getting a large number of "waited for
> full buffer" events.  Most references state that decreasing the size and
> number of buffers would help reduce the number of events, but I've seen
> little to no difference in doing so.  I'm now back to running with the
> default 16 buffers at 64K.  Could there be something else hanging this up?
> 
> Just looking for someone with some real experiences with this instead of
> what a tech doc tells me.
> 
> 
> Thanks,
> Ty King
> _______________________________________________
> Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

-- 
Larry Kingery 
               Why is "abbreviation" such a long word?

<Prev in Thread] Current Thread [Next in Thread>