You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by Greg Stein <gs...@lyra.org> on 2001/01/18 03:05:09 UTC

Re: transient buckets

On Wed, Jan 17, 2001 at 04:46:37PM -0800, rbb@covalent.net wrote:
>...
> > Transient buckets were created so that we can defer the copying as long as
> > possible. If the bucket hits the network stack, then it could even go out
> > over the network without a copy(!).
> 
> The problem here, is that transient buckets tend to be incredibly small,
> so in practice this just doesn't work.  Instead of causing a zero copy,
> what we end up doing is allocating a lot of very small buckets that get
> copied at least twice; once the first time they are set-aside, and again
> when we coalesce.

setaside and coalesce are mutually exclusive. We do one or the other.

We call setaside() and shove the brigade to the side. Or we call coalesce
into another brigade (and any setaside calls on that one will be a noop).

> Oh, make that three times, because we also tend to
> allocate before we actually call an ap_r* function.

There is nothing we can do about calls "above" the ap_r* or bucket creation
functions. What the handler does is out of our control. We can only
establish zero-copy once the content hits our API. That said, we can add
more APIs to assist in preventing copies (e.g. file and pipe buckets).

> > In this particular case, I believe that we can avoid the allocation of the
> > brigade and bucket structures. Each ap_r* call would not have an allocation
> > unless/until a setaside occurs. If the lower level did a copy (into an
> > existing buffer) rather than a setaside, then we wouldn't have any
> > additional allocations.
> 
> The problem with this that I see, is that it isn't a general solution.  It
> is a solution to a VERY specific problem.

And the "VERY specific problem" is ap_r*'s poor allocation performance. That
is what we're seeking to solve.

Second priority is a general solution for other APR users. If one can be
used for the other, then great. Personally, I'd rather focus on Apache first
and see what we can do there. In the future, if another solution presents
itself within APR, which can solve Apache's problem, then cool... we can use
it.

> It happens that the case we
> have seen so far is one of a lot of transient buckets in a row that could
> be copied into a single location.  In reality, the problem is that we
> don't do any buffering at all.

We should be coalescing small brigades / buckets just before we hit the
CHUNK filter. Always. We should never apply chunk headers to 1-byte
brigades.

Solving the coalesce-before-chunk problem will also happen to solve the
ap_r* problem.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: transient buckets

Posted by rb...@covalent.net.

> > The problem here, is that transient buckets tend to be incredibly small,
> > so in practice this just doesn't work.  Instead of causing a zero copy,
> > what we end up doing is allocating a lot of very small buckets that get
> > copied at least twice; once the first time they are set-aside, and again
> > when we coalesce.
> 
> setaside and coalesce are mutually exclusive. We do one or the other.

No, currently we do both.

> > Oh, make that three times, because we also tend to
> > allocate before we actually call an ap_r* function.
> 
> There is nothing we can do about calls "above" the ap_r* or bucket creation
> functions. What the handler does is out of our control. We can only
> establish zero-copy once the content hits our API. That said, we can add
> more APIs to assist in preventing copies (e.g. file and pipe buckets).

I disagree.  Currently, I know for a fact that we took a lot of calls to
ap_bputs and made them calls to apr_pstrdup inside a bucket_create
call.  We did this because it was the only way to get strings of any real
size into the bucket.  With my patch from earlier, we could do a single
copy directly into the buffer, and not require multiple copies.  BTW, I
know this for a fact, because I was the person who made the changes.

> > > In this particular case, I believe that we can avoid the allocation of the
> > > brigade and bucket structures. Each ap_r* call would not have an allocation
> > > unless/until a setaside occurs. If the lower level did a copy (into an
> > > existing buffer) rather than a setaside, then we wouldn't have any
> > > additional allocations.
> > 
> > The problem with this that I see, is that it isn't a general solution.  It
> > is a solution to a VERY specific problem.
> 
> And the "VERY specific problem" is ap_r*'s poor allocation performance. That
> is what we're seeking to solve.

No, we are seeking to solve a generic problem with the bucket_brigade
mechanism.  Namely, that it encourages people to create very small
buckets, especially when used in conjunction with a much older API, namely
the ap_r* functions.

As Dean said, it is very unlikely that other programs will not require
this buffer.  Designing for Apache without thinking about the other
programs is very short-sighted IMHO.

> Second priority is a general solution for other APR users. If one can be
> used for the other, then great. Personally, I'd rather focus on Apache first
> and see what we can do there. In the future, if another solution presents
> itself within APR, which can solve Apache's problem, then cool... we can use
> it.

But we have a single solution that works for both.  It has one major
problem, in that it doesn't work well with the old API.  Why are we trying
to make it work well with the old API?  I just don't see that need.  The
old API works on it's own, if you want to use any part of the new API,
then you have to switch to the new API completely.  How many times has MS
gotten in trouble because it tries to make two completely different APIs
work together?  Why are we making the same mistake?

> > It happens that the case we
> > have seen so far is one of a lot of transient buckets in a row that could
> > be copied into a single location.  In reality, the problem is that we
> > don't do any buffering at all.
> 
> We should be coalescing small brigades / buckets just before we hit the
> CHUNK filter. Always. We should never apply chunk headers to 1-byte
> brigades.
> 
> Solving the coalesce-before-chunk problem will also happen to solve the
> ap_r* problem.

Coalescing before the chunk filter won't solve the problem of allocating a
huge number of buckets that just get freed almost immediately.  The
coalesce filter is also just the wrong idea in general.  There is no
reason to wait to do the copy, and if we never call the coalesce filter,
then we end up sending very small buckets.  Let's step back and look at
your design again.

Let's suppose I create a handler that creates HEAP buckets of 5 bytes, and
in between each heap bucket, I call ap_send_size which behind the covers
calls ap_rputs.  A new bucket type won't solve that problem, the problem
is with the API itself.  We need to allow programmers an easy way to write
into buckets without requiring that they create a new bucket each time
they call the write function.  Do the coalescing up at the top, before the
bucket is ever created, it stops us from allocating buckets that we just
end up freeing, and it works.

However, I am sick of arguing theories.  Please provide a patch so that we
can see how well each method works.  I have provided a patch, and a stack
trace as an example of what can be accomplished.

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------