You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Bill Stoddard <bi...@wstoddard.com> on 2003/01/06 19:49:59 UTC
HTTP Input header filter
Just a heads up in case anyone else is interested or is comtemplating working on
this...
I am rewriting much of the code called by ap_read_request to handle HTTP
headers. Much of the function in rgetline_core, read_request_headers and
get_mime_headers_core is being reimplemented in ap_http_headers_input_filter.
My goals for the rewrite are to eliminate at least 1000 instructions from the
mainline code path, improve readability and maintainability of the code and to
make doing async/non-blocking network reads a bit easier to implement. I hope
to have something for review by the end of the week.
Bill
Re: HTTP Input header filter
Posted by Brian Pane <br...@cnet.com>.
Joe Schaefer wrote:
>Greg Ames <gr...@apache.org> writes:
>
>
>
>>Joe Schaefer wrote:
>>
>>
>
>[...]
>
>
>
>>>Just looking over the 2.0
>>>source, it looks to me like r->headers_in is an empty
>>>apr_table prior to the apr_table_overlap() call at the
>>>end of get_mime_headers_core().
>>>
>>>
>>>
>>I'll take your word for it.
>>
>>
>
>Please treat the following comments with the appropriate
>level of contempt, since at this moment apreq_tables are
>still vaporware...
>
>I've written a table implementation for apreq-2 that
>superimposes an array of binary trees over the table's
>array entries. It allows for "dead entries" to appear in
>the array, so that "restacking" the table entries
>(and reindexing the superimposed tree nodes) wouldn't
>be necessary when table entries are unset or merged.
>
>The apr_table_overlap call in get_mime_headers_core()
>effectively merges all duplicate input headers, and it
>uses red-black trees to do so. If apr_tables were able
>to tolerate "dead entries", it might be possible to port
>some of the apreq_table implementation to apr (assuming
>the apreq_table approach proves to be more efficient).
>
Definitely sounds interesting, if it proves faster than
the current APR tables. One other thing that might speed
up apr_table_overlap is to eliminate the red-black trees.
I added them back when the tables themselves had no
indices. Now that each apr_table_t contains an internal
index (a crude one, but fairly effective in practice),
we might be able to accelerate the overlap operation by
using that index in place of the red-black trees.
Brian
Re: HTTP Input header filter
Posted by Joe Schaefer <jo...@sunstarsys.com>.
Greg Ames <gr...@apache.org> writes:
> Joe Schaefer wrote:
[...]
> > Just looking over the 2.0
> > source, it looks to me like r->headers_in is an empty
> > apr_table prior to the apr_table_overlap() call at the
> > end of get_mime_headers_core().
> >
>
> I'll take your word for it.
Please treat the following comments with the appropriate
level of contempt, since at this moment apreq_tables are
still vaporware...
I've written a table implementation for apreq-2 that
superimposes an array of binary trees over the table's
array entries. It allows for "dead entries" to appear in
the array, so that "restacking" the table entries
(and reindexing the superimposed tree nodes) wouldn't
be necessary when table entries are unset or merged.
The apr_table_overlap call in get_mime_headers_core()
effectively merges all duplicate input headers, and it
uses red-black trees to do so. If apr_tables were able
to tolerate "dead entries", it might be possible to port
some of the apreq_table implementation to apr (assuming
the apreq_table approach proves to be more efficient).
--
Joe Schaefer
Re: HTTP Input header filter
Posted by Greg Ames <gr...@apache.org>.
Joe Schaefer wrote:
> Can r->headers_in ever contain any entries prior to the
> get_mime_headers_core() call?
no (barring weird bugs of course).
> Just looking over the 2.0
> source, it looks to me like r->headers_in is an empty
> apr_table prior to the apr_table_overlap() call at the
> end of get_mime_headers_core().
>
I'll take your word for it.
Greg
Re: HTTP Input header filter
Posted by Joe Schaefer <jo...@sunstarsys.com>.
"Bill Stoddard" <bi...@wstoddard.com> writes:
> I am rewriting much of the code called by ap_read_request
> to handle HTTP headers. Much of the function in rgetline_core,
> read_request_headers and get_mime_headers_core is being
> reimplemented in ap_http_headers_input_filter.
Can r->headers_in ever contain any entries prior to the
get_mime_headers_core() call? Just looking over the 2.0
source, it looks to me like r->headers_in is an empty
apr_table prior to the apr_table_overlap() call at the
end of get_mime_headers_core().
--
Joe Schaefer
Re: HTTP Input header filter
Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
--On Thursday, February 6, 2003 9:07 AM -0500 Bill Stoddard
<bi...@wstoddard.com> wrote:
> ap_http_filter logic in the header parser filter, create a new API
> to push bytes back to the core_input_filter when the header parser
> filter reads too many bytes and variations.
Eww.
Regardless, I'll wait to see some code, but be aware that I might
have some objections to the patch if you go your stated route.
Perhaps it won't be as bad as you are making it out to be. =) --
justin
Re: HTTP Input header filter
Posted by Bill Stoddard <bi...@wstoddard.com>.
Justin Erenkrantz wrote:
> --On Wednesday, February 5, 2003 4:32 PM -0500 Bill Stoddard
> <bi...@wstoddard.com> wrote:
>
>> 1. Installing this filter for the duration of a connection. It is
>> still a protocol filter, but it lasts for the duration of the
>> connection. In order to handle pipelined connections, an
>
>
> Hmm. I'm wondering if we're mis-understanding something here
> My understanding based on this comment is that this filter will actually
> read the request line as well as the headers and body. But, that's not
> what ap_http_filter is for - it only deals with reading the body of the
> request. Why are we altering that? (Why not two filters?)
Because a header parsing filter needs to know when one request ends and
another request begins. ap_http_filter knows how to do this. An http
header filter would have to replicate much of this logic. There are
other mechanisms we can use to deal with this: replicate ap_http_filter
logic in the header parser filter, create a new API to push bytes back
to the core_input_filter when the header parser filter reads too many
bytes and variations.
>
> If you do mean that all of the HTTP logic will be moving into this
> filter, a potential concern with this will be how to deal with protocol
> upgrades in the middle of a connection.
>
> I know that there are plans to utilize the Upgrade header via a
> waka-like protocol or a TLS upgrade. So, we should be able to replace
> the HTTP protocol handling on a per-request basis. Making
> ap_http_filter span connections may make that unnecessarily hard.
Humm...
> ap_http_filter was previously a connection filter, but it became way too
> complicated as it tried to handle what state it was in. The code became
> impossible to deal with, so it was refactored.
The filter will remain a 'protocol' filter, but it will stay installed
the duration of the connection. The new ap_http_filter will not be
appreciably more complex than the current one (I hope :-)
>
> I guess I'd like to see justification why the core component of a
> request should not be request-centric. -- justin
Two words, pipelined requests.
I'll try to get my thoughts translated into working code which should
help us explore some of the tradeoffs.
Bill
Re: HTTP Input header filter
Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
--On Wednesday, February 5, 2003 4:32 PM -0500 Bill Stoddard
<bi...@wstoddard.com> wrote:
> 1. Installing this filter for the duration of a connection. It is
> still a protocol filter, but it lasts for the duration of the
> connection. In order to handle pipelined connections, an
Hmm. I'm wondering if we're mis-understanding something here.
My understanding based on this comment is that this filter will
actually read the request line as well as the headers and body. But,
that's not what ap_http_filter is for - it only deals with reading
the body of the request. Why are we altering that? (Why not two
filters?)
If you do mean that all of the HTTP logic will be moving into this
filter, a potential concern with this will be how to deal with
protocol upgrades in the middle of a connection.
I know that there are plans to utilize the Upgrade header via a
waka-like protocol or a TLS upgrade. So, we should be able to
replace the HTTP protocol handling on a per-request basis. Making
ap_http_filter span connections may make that unnecessarily hard.
ap_http_filter was previously a connection filter, but it became way
too complicated as it tried to handle what state it was in. The code
became impossible to deal with, so it was refactored.
I guess I'd like to see justification why the core component of a
request should not be request-centric. -- justin
Re: HTTP Input header filter
Posted by Bill Stoddard <bi...@wstoddard.com>.
Bill Stoddard wrote:
> Brian Pane wrote:
>
>> Bill Stoddard wrote:
>>
>>> Just a heads up in case anyone else is interested or is comtemplating
>>> working on
>>> this...
>>>
>>> I am rewriting much of the code called by ap_read_request to handle HTTP
>>> headers. Much of the function in rgetline_core, read_request_headers and
>>> get_mime_headers_core is being reimplemented
Another update.. finding a bit of time to work on this. My latest
efforts are focused on reimplementing ap_http_filter in http_protocol.c.
I am modifying the ap_http_filter as follows:
1. Installing this filter for the duration of a connection. It is still
a protocol filter, but it lasts for the duration of the connection. In
order to handle pipelined connections, an http_header parsing filter
must have the smarts to identify the boundary between pipelined
requests. ap_header_filter has the smarts, so it made sense to draft it
for parsing http headers rather than attempting to recreate all the
logic in a standalone http header parsing filter.
2. ap_http_filter will be 100% state driven
3. It will have the ability to setaside brigades on request boundaries
and maintain the setaside brigade in the filter context (hung off the
connection pool)
My initial implementation of a standalone http header parsing filter
trimmed about 2600 instructions from an http transaction. Consolidating
the header parsing function into ap_http_filter should save a few more
instructions as it is installed once per connection rather than per request.
No promises on when I will have something reviewable...
Bill
Re: HTTP Input header filter
Posted by Bill Stoddard <bi...@wstoddard.com>.
Brian Pane wrote:
> Bill Stoddard wrote:
>
>> Just a heads up in case anyone else is interested or is comtemplating
>> working on
>> this...
>>
>> I am rewriting much of the code called by ap_read_request to handle HTTP
>> headers. Much of the function in rgetline_core, read_request_headers and
>> get_mime_headers_core is being reimplemented in
>> ap_http_headers_input_filter.
>> My goals for the rewrite are to eliminate at least 1000 instructions
>> from the
>> mainline code path, improve readability and maintainability of the
>> code and to
>> make doing async/non-blocking network reads a bit easier to
>> implement. I hope
>> to have something for review by the end of the week.
>>
>> Bill
>>
>>
>
> +1! According to the last batch of profile data I looked at, this part
> of the code is a prime candidate for optimization. I'm happy to help
> test/review the changes when they're ready.
>
> Brian
Just to let you know I have not forgotten about this. I managed to spend
some time on it today and I'm making some progress. January is a really
busy month for me. Februrary will be much better for writing code.
Bill
Re: HTTP Input header filter
Posted by Brian Pane <br...@cnet.com>.
Bill Stoddard wrote:
>Just a heads up in case anyone else is interested or is comtemplating working on
>this...
>
>I am rewriting much of the code called by ap_read_request to handle HTTP
>headers. Much of the function in rgetline_core, read_request_headers and
>get_mime_headers_core is being reimplemented in ap_http_headers_input_filter.
>My goals for the rewrite are to eliminate at least 1000 instructions from the
>mainline code path, improve readability and maintainability of the code and to
>make doing async/non-blocking network reads a bit easier to implement. I hope
>to have something for review by the end of the week.
>
>Bill
>
>
+1! According to the last batch of profile data I looked at, this part
of the code is a prime candidate for optimization. I'm happy to help
test/review the changes when they're ready.
Brian