You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Ryan Bloom <rb...@covalent.net> on 2002/07/04 00:26:10 UTC

RE: conserving file descriptors vs. ap_save_brigade()

How big a problem is this really?  Most of the time, the content-length
filter isn't supposed to actually buffer the brigade.  It should only be
doing the buffering if we are doing a keepalive request and we can't do
chunking.

My own opinion is that if we are in that situation, we are potentially
best served by just turning off keepalives.

Ryan

----------------------------------------------
Ryan Bloom                  rbb@covalent.net
645 Howard St.              rbb@apache.org
San Francisco, CA 

> -----Original Message-----
> From: Brian Pane [mailto:bpane@pacbell.net]
> Sent: Thursday, July 04, 2002 11:44 AM
> To: dev@httpd.apache.org
> Subject: conserving file descriptors vs. ap_save_brigade()
> 
> I'm working on a fix to keep file buckets from being mmaped when
> they're set aside.  (The motivation for this is to eliminate the
> bogus mmap+memcpy+munmap that happens when a client requests a
> file smaller than 8KB over a keep-alive connection.)
> 
> The biggest problem I've found is that the scalability of the
> content-length filter depends on a side-effect of the current
> mmap-on-file-setaside semantics.
> 
> Consider the case of an SSI file that includes 10 non-SSI html
> files of 500 bytes each.  As content streams through the output
> filters, the content-length filter sets aside each bucket until
> it sees EOS.  Currently, the setaside of each of the 10 file
> buckets turns the file bucket into an mmap bucket.  The good
> news is that this keeps us from running out of file descriptors
> if a threaded MPM is handling a hundred requests like this at
> once.  The bad news is that the mmap causes a performance
> degradation.
> 
> The solutions I've thought of so far are:
> 
> 1. Change the content-length filter to send its saved brigade
>    on to the next filter if it contains more than one file
>    bucket.  (This basically would match the semantics of
>    the core output filter, which starts writing to the socket
>    if it finds any files at all in the brigade.)
>       Pros: We can eliminate the mmap for file bucket setaside
>         and not have to worry about running out of file descriptors.
>       Cons: We lose the ability to compute the content length
>         in this example.  (I'm so accustomed to web servers not
>         being able to report a content-length on SSI requests
>         that this wouldn't bother me, though.)
> 
> 2. Create two variants of ap_save_brigade(): one that does a
>    setaside of the buckets for use in core_output_filter(), and
>    another that does a read of the buckets (to force the mmap
>    of file buckets) for use in ap_content_length_filter().
>       Pros: Allows us to eliminate the mmap on small keepalive
>         requests, and doesn't limit our ability to report a
>         content-length on SSI requests.
>       Cons: We'll still have a performance problem for a lot of
>         SSI requests due to the mmap in the content-length filter.
> 
> 
> I like option 1 the best, unless anyone can think of a third option
> that would work better.
> 
> --Brian
> 



Re: Removing C-L filter when C-L is known was Re: conserving file descriptors vs. ap_save_brigade()

Posted by Brian Pane <bp...@pacbell.net>.
Justin Erenkrantz wrote:

>On Sat, Jul 06, 2002 at 12:09:40PM -0700, Ryan Bloom wrote:
>  
>
>>Pay attention to what that message says please.  It says "The filter
>>should always be in the stack, and it should always collect
>>information."  It doesn't say "The filter should always be touching the
>>C-L for the response."  We use the data collected by the c-L filter in
>>other places, so we should always try to compute the C-L in the filter.
>>However, we should NEVER buffer unless we absolutely have to.
>>    
>>
>
>What data is it collecting?  r->bytes_sent?  We already know what
>that value will be as it will be identical to the C-L.
>
>To make it concrete as to what I have been suggesting in
>ap_content_length_filter:
>
>if (!ctx) { /* first time through */
>    char *c;
>    c = apr_table_get(r->headers_out, "Content-Length");
>

This won't help us much: in all the cases where the ap_save_brigade()
in the C-L filter is a performance problem (e.g., the pipe and SSI
examples), we're not going to have a C-L in r->headers_out.

And this change will slow down the server, by adding another call to
apr_table_get(), which we already know is a performance problem.

--Brian



Removing C-L filter when C-L is known was Re: conserving file descriptors vs. ap_save_brigade()

Posted by Justin Erenkrantz <je...@apache.org>.
On Sat, Jul 06, 2002 at 12:09:40PM -0700, Ryan Bloom wrote:
> Pay attention to what that message says please.  It says "The filter
> should always be in the stack, and it should always collect
> information."  It doesn't say "The filter should always be touching the
> C-L for the response."  We use the data collected by the c-L filter in
> other places, so we should always try to compute the C-L in the filter.
> However, we should NEVER buffer unless we absolutely have to.

What data is it collecting?  r->bytes_sent?  We already know what
that value will be as it will be identical to the C-L.

To make it concrete as to what I have been suggesting in
ap_content_length_filter:

if (!ctx) { /* first time through */
    char *c;
    c = apr_table_get(r->headers_out, "Content-Length");
    if (c) { /* Known content length. */
        /* Can we rely on r->clength being valid?  Perhaps because of
         * ap_set_content_length's implementation.  If not, we'd have
         * to convert the c value to an off_t.
	 */
        r->bytes_sent += r->clength;
        ap_remove_output_filter(f);
        return ap_pass_brigade(f->next, b); 
    }
    f->ctx = ctx = apr_pcalloc(r->pool, sizeof(struct content_length_ctx));
...rest of ap_content_length_filter unchanged...

Does this make it any clearer?  The side effect r->bytes_sent is
still maintained and the common case becomes a lot faster and we
don't bother to read any of the buckets.  -- justin

RE: conserving file descriptors vs. ap_save_brigade()

Posted by Brian Pane <bp...@pacbell.net>.
On Sat, 2002-07-06 at 12:09, Ryan Bloom wrote:
> > From: Justin Erenkrantz [mailto:jerenkrantz@apache.org]
> > 
> > On Sat, Jul 06, 2002 at 08:25:18AM -0700, Ryan Bloom wrote:
> > > > From: Justin Erenkrantz [mailto:jerenkrantz@apache.org]
> > > > Of course, in the common case of a static file with no filters, we
> > > > already know the content-length (default handler sets it).
> > > >
> > > > IIRC, I've brought up skipping the C-L filter when we already know
> > > > the C-L (as defined by r->headers_out), but that was not met with
> > > > approval.
> > >
> > > Who didn't approve that?  I was under the impression that we did
> skip
> > > the C-L filter if we already had a C-L, and it was the filters
> > > responsibility to remove the C-L if it was changing it.
> > 
> > <Pi...@shell.ntrnet.net>
> > 
> > I don't know who might have said that.  =)  -- Justin
> 
> Pay attention to what that message says please.  It says "The filter
> should always be in the stack, and it should always collect
> information."  It doesn't say "The filter should always be touching the
> C-L for the response."  We use the data collected by the c-L filter in
> other places, so we should always try to compute the C-L in the filter.
> However, we should NEVER buffer unless we absolutely have to.

Agreed.  But what are the circumstances in which we absolutely
have to buffer?

I suppose keepalive requests for clients that don't support
chunked encoding are the canonical absolutely-must-buffer case.
But even there, I'd rather force a "Connection: close" in
cases where the buffering is going to cause performance problems
or cause us to run out of file descriptors.

--Brian



RE: conserving file descriptors vs. ap_save_brigade()

Posted by Brian Pane <bp...@pacbell.net>.
On Sat, 2002-07-06 at 12:09, Ryan Bloom wrote:
> > From: Justin Erenkrantz [mailto:jerenkrantz@apache.org]
> > 
> > On Sat, Jul 06, 2002 at 08:25:18AM -0700, Ryan Bloom wrote:
> > > > From: Justin Erenkrantz [mailto:jerenkrantz@apache.org]
> > > > Of course, in the common case of a static file with no filters, we
> > > > already know the content-length (default handler sets it).
> > > >
> > > > IIRC, I've brought up skipping the C-L filter when we already know
> > > > the C-L (as defined by r->headers_out), but that was not met with
> > > > approval.
> > >
> > > Who didn't approve that?  I was under the impression that we did
> skip
> > > the C-L filter if we already had a C-L, and it was the filters
> > > responsibility to remove the C-L if it was changing it.
> > 
> > <Pi...@shell.ntrnet.net>
> > 
> > I don't know who might have said that.  =)  -- Justin
> 
> Pay attention to what that message says please.  It says "The filter
> should always be in the stack, and it should always collect
> information."  It doesn't say "The filter should always be touching the
> C-L for the response."  We use the data collected by the c-L filter in
> other places, so we should always try to compute the C-L in the filter.
> However, we should NEVER buffer unless we absolutely have to.

The C-L filter is actually too early to be computing r->bytes_sent.

The problem is pipe buckets.  If the output brigade contains a
pipe bucket, or anything else for which the length is not yet
known, the C-L filter does an apr_bucket_read().  That's bad.
We should leave file and pipe buckets unread until the core
output filter, so that we don't lose the opportunity to send
them using sendfile.

Is there anything that happens after the C-L filter but before
the core output filter that requires a final value for r->bytes_sent?

--Brian



RE: conserving file descriptors vs. ap_save_brigade()

Posted by Ryan Bloom <rb...@covalent.net>.
> From: Justin Erenkrantz [mailto:jerenkrantz@apache.org]
> 
> On Sat, Jul 06, 2002 at 08:25:18AM -0700, Ryan Bloom wrote:
> > > From: Justin Erenkrantz [mailto:jerenkrantz@apache.org]
> > > Of course, in the common case of a static file with no filters, we
> > > already know the content-length (default handler sets it).
> > >
> > > IIRC, I've brought up skipping the C-L filter when we already know
> > > the C-L (as defined by r->headers_out), but that was not met with
> > > approval.
> >
> > Who didn't approve that?  I was under the impression that we did
skip
> > the C-L filter if we already had a C-L, and it was the filters
> > responsibility to remove the C-L if it was changing it.
> 
> <Pi...@shell.ntrnet.net>
> 
> I don't know who might have said that.  =)  -- Justin

Pay attention to what that message says please.  It says "The filter
should always be in the stack, and it should always collect
information."  It doesn't say "The filter should always be touching the
C-L for the response."  We use the data collected by the c-L filter in
other places, so we should always try to compute the C-L in the filter.
However, we should NEVER buffer unless we absolutely have to.

Ryan


Re: conserving file descriptors vs. ap_save_brigade()

Posted by Justin Erenkrantz <je...@apache.org>.
On Sat, Jul 06, 2002 at 08:25:18AM -0700, Ryan Bloom wrote:
> > From: Justin Erenkrantz [mailto:jerenkrantz@apache.org]
> > Of course, in the common case of a static file with no filters, we
> > already know the content-length (default handler sets it).
> > 
> > IIRC, I've brought up skipping the C-L filter when we already know
> > the C-L (as defined by r->headers_out), but that was not met with
> > approval.
> 
> Who didn't approve that?  I was under the impression that we did skip
> the C-L filter if we already had a C-L, and it was the filters
> responsibility to remove the C-L if it was changing it.

<Pi...@shell.ntrnet.net>

I don't know who might have said that.  =)  -- justin

RE: conserving file descriptors vs. ap_save_brigade()

Posted by Ryan Bloom <rb...@covalent.net>.
> From: Justin Erenkrantz [mailto:jerenkrantz@apache.org]
> 
> On Sat, Jul 06, 2002 at 01:03:42AM -0700, Brian Pane wrote:
> > As it's currently implemented, the C-L filter is trying to compute
> > the C-L on everything by default.  It only gives up in a few cases:
> 
> Of course, in the common case of a static file with no filters, we
> already know the content-length (default handler sets it).
> 
> IIRC, I've brought up skipping the C-L filter when we already know
> the C-L (as defined by r->headers_out), but that was not met with
> approval.

Who didn't approve that?  I was under the impression that we did skip
the C-L filter if we already had a C-L, and it was the filters
responsibility to remove the C-L if it was changing it.

Ryan



Re: conserving file descriptors vs. ap_save_brigade()

Posted by Justin Erenkrantz <je...@apache.org>.
On Sat, Jul 06, 2002 at 01:03:42AM -0700, Brian Pane wrote:
> As it's currently implemented, the C-L filter is trying to compute
> the C-L on everything by default.  It only gives up in a few cases:

Of course, in the common case of a static file with no filters, we
already know the content-length (default handler sets it).

IIRC, I've brought up skipping the C-L filter when we already know
the C-L (as defined by r->headers_out), but that was not met with
approval.

Perhaps, we should reconsider this?  -- justin

RE: conserving file descriptors vs. ap_save_brigade()

Posted by Brian Pane <bp...@pacbell.net>.
On Wed, 2002-07-03 at 20:07, Ryan Bloom wrote:
> > From: Brian Pane [mailto:bpane@pacbell.net]
> > 
> > On Wed, 2002-07-03 at 15:26, Ryan Bloom wrote:
> > > How big a problem is this really?  Most of the time, the
> content-length
> > > filter isn't supposed to actually buffer the brigade.  It should
> only be
> > > doing the buffering if we are doing a keepalive request and we can't
> do
> > > chunking.
> > 
> > I'm seeing the buffering in non-keepalive tests, though.
> 
> Then we have a bug.  The C-L filter should only be trying to compute the
> C-L if we MUST have one on the response, and we don't have to have one
> in the non-keepalive case.

As it's currently implemented, the C-L filter is trying to compute
the C-L on everything by default.  It only gives up in a few cases:

  * More than 32KB of output buffered, and either
  * A flush bucket in the brigade
  * EAGAIN when trying to read from a pipe bucket

And in all of these cases, it it also requires either a non-keepalive
request or a client that supports chunked encoding in order to give up
on computing the length.

So if we have a long stream of small file buckets from an SSI request,
we'll never hit any of the conditions that cause the C-L filter to give
up trying to compute the C-L.

--Brian



RE: conserving file descriptors vs. ap_save_brigade()

Posted by Ryan Bloom <rb...@covalent.net>.
> From: Brian Pane [mailto:bpane@pacbell.net]
> 
> On Wed, 2002-07-03 at 15:26, Ryan Bloom wrote:
> > How big a problem is this really?  Most of the time, the
content-length
> > filter isn't supposed to actually buffer the brigade.  It should
only be
> > doing the buffering if we are doing a keepalive request and we can't
do
> > chunking.
> 
> I'm seeing the buffering in non-keepalive tests, though.

Then we have a bug.  The C-L filter should only be trying to compute the
C-L if we MUST have one on the response, and we don't have to have one
in the non-keepalive case.

Ryan



RE: conserving file descriptors vs. ap_save_brigade()

Posted by Brian Pane <bp...@pacbell.net>.
On Wed, 2002-07-03 at 15:26, Ryan Bloom wrote:
> How big a problem is this really?  Most of the time, the content-length
> filter isn't supposed to actually buffer the brigade.  It should only be
> doing the buffering if we are doing a keepalive request and we can't do
> chunking.

I'm seeing the buffering in non-keepalive tests, though.

--Brian