You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by Bill Stoddard <bi...@wstoddard.com> on 2003/01/06 19:49:59 UTC

HTTP Input header filter

Just a heads up in case anyone else is interested or is comtemplating working on
this...

I am rewriting much of the code called by ap_read_request to handle HTTP
headers. Much of the function in rgetline_core, read_request_headers and
get_mime_headers_core is being reimplemented in ap_http_headers_input_filter.
My goals for the rewrite are to eliminate at least 1000 instructions from the
mainline code path, improve readability and maintainability of the code and to
make doing async/non-blocking network reads a bit easier to implement.  I hope
to have something for review by the end of the week.

Bill

Re: HTTP Input header filter

Posted by Brian Pane <br...@cnet.com>.

Joe Schaefer wrote:

>Greg Ames <gr...@apache.org> writes:
>
>  
>
>>Joe Schaefer wrote:
>>    
>>
>
>[...]
>
>  
>
>>>Just looking over the 2.0 
>>>source, it looks to me like r->headers_in is an empty 
>>>apr_table prior to the apr_table_overlap() call at the 
>>>end of get_mime_headers_core().
>>>
>>>      
>>>
>>I'll take your word for it.
>>    
>>
>
>Please treat the following comments with the appropriate 
>level of contempt, since at this moment apreq_tables are 
>still vaporware...
>
>I've written a table implementation for apreq-2 that 
>superimposes an array of binary trees over the table's
>array entries.  It allows for "dead entries" to appear in 
>the array, so that "restacking" the table entries 
>(and reindexing the superimposed tree nodes) wouldn't
>be necessary when table entries are unset or merged.
>
>The apr_table_overlap call in get_mime_headers_core()
>effectively merges all duplicate input headers, and it 
>uses red-black trees to do so.  If apr_tables were able
>to tolerate "dead entries", it might be possible to port 
>some of the apreq_table implementation to apr (assuming 
>the apreq_table approach proves to be more efficient).
>

Definitely sounds interesting, if it proves faster than
the current APR tables.  One other thing that might speed
up apr_table_overlap is to eliminate the red-black trees.
I added them back when the tables themselves had no
indices.  Now that each apr_table_t contains an internal
index (a crude one, but fairly effective in practice),
we might be able to accelerate the overlap operation by
using that index in place of the red-black trees.

Brian

Re: HTTP Input header filter

Posted by Joe Schaefer <jo...@sunstarsys.com>.

Greg Ames <gr...@apache.org> writes:

> Joe Schaefer wrote:

[...]

> > Just looking over the 2.0 
> > source, it looks to me like r->headers_in is an empty 
> > apr_table prior to the apr_table_overlap() call at the 
> > end of get_mime_headers_core().
> > 
> 
> I'll take your word for it.

Please treat the following comments with the appropriate 
level of contempt, since at this moment apreq_tables are 
still vaporware...

I've written a table implementation for apreq-2 that 
superimposes an array of binary trees over the table's
array entries.  It allows for "dead entries" to appear in 
the array, so that "restacking" the table entries 
(and reindexing the superimposed tree nodes) wouldn't
be necessary when table entries are unset or merged.

The apr_table_overlap call in get_mime_headers_core()
effectively merges all duplicate input headers, and it 
uses red-black trees to do so.  If apr_tables were able
to tolerate "dead entries", it might be possible to port 
some of the apreq_table implementation to apr (assuming 
the apreq_table approach proves to be more efficient).

-- 
Joe Schaefer

Re: HTTP Input header filter

Posted by Greg Ames <gr...@apache.org>.

Joe Schaefer wrote:

> Can r->headers_in ever contain any entries prior to the 
> get_mime_headers_core() call?  

no (barring weird bugs of course).

> Just looking over the 2.0 
> source, it looks to me like r->headers_in is an empty 
> apr_table prior to the apr_table_overlap() call at the 
> end of get_mime_headers_core().
> 

I'll take your word for it.

Greg

Re: HTTP Input header filter

Posted by Joe Schaefer <jo...@sunstarsys.com>.

"Bill Stoddard" <bi...@wstoddard.com> writes:

> I am rewriting much of the code called by ap_read_request 
> to handle HTTP headers. Much of the function in rgetline_core,
> read_request_headers and get_mime_headers_core is being 
> reimplemented in ap_http_headers_input_filter. 

Can r->headers_in ever contain any entries prior to the 
get_mime_headers_core() call?  Just looking over the 2.0 
source, it looks to me like r->headers_in is an empty 
apr_table prior to the apr_table_overlap() call at the 
end of get_mime_headers_core().

-- 
Joe Schaefer

Re: HTTP Input header filter

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.

--On Thursday, February 6, 2003 9:07 AM -0500 Bill Stoddard 
<bi...@wstoddard.com> wrote:

> ap_http_filter logic in the header parser filter, create a new API
> to push bytes back to the core_input_filter when the header parser
> filter reads too many bytes and variations.

Eww.

Regardless, I'll wait to see some code, but be aware that I might 
have some objections to the patch if you go your stated route. 
Perhaps it won't be as bad as you are making it out to be.  =)  -- 
justin

Re: HTTP Input header filter

Posted by Bill Stoddard <bi...@wstoddard.com>.

Justin Erenkrantz wrote:
> --On Wednesday, February 5, 2003 4:32 PM -0500 Bill Stoddard 
> <bi...@wstoddard.com> wrote:
> 
>> 1. Installing this filter for the duration of a connection. It is
>> still a protocol filter, but it lasts for the duration of the
>> connection. In order to handle pipelined connections, an
> 
> 
> Hmm.  I'm wondering if we're mis-understanding something here
> My understanding based on this comment is that this filter will actually 
> read the request line as well as the headers and body.  But, that's not 
> what ap_http_filter is for - it only deals with reading the body of the 
> request.  Why are we altering that?  (Why not two filters?)
Because a header parsing filter needs to know when one request ends and 
another request begins. ap_http_filter knows how to do this. An http 
header filter would have to replicate much of this logic. There are 
other mechanisms we can use to deal with this: replicate ap_http_filter 
logic in the header parser filter, create a new API to push bytes back 
to the core_input_filter when the header parser filter reads too many 
bytes and variations.

> 
> If you do mean that all of the HTTP logic will be moving into this 
> filter, a potential concern with this will be how to deal with protocol 
> upgrades in the middle of a connection.
> 
> I know that there are plans to utilize the Upgrade header via a 
> waka-like protocol or a TLS upgrade.  So, we should be able to replace 
> the HTTP protocol handling on a per-request basis.  Making 
> ap_http_filter span connections may make that unnecessarily hard. 
Humm...

> ap_http_filter was previously a connection filter, but it became way too 
> complicated as it tried to handle what state it was in.  The code became 
> impossible to deal with, so it was refactored.
The filter will remain a 'protocol' filter, but it will stay installed 
the duration of the connection. The new ap_http_filter will not be 
appreciably more complex than the current one (I hope :-)

> 
> I guess I'd like to see justification why the core component of a 
> request should not be request-centric.  -- justin

Two words, pipelined requests.

I'll try to get my thoughts translated into working code which should 
help us explore some of the tradeoffs.

Bill

Re: HTTP Input header filter

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.

--On Wednesday, February 5, 2003 4:32 PM -0500 Bill Stoddard 
<bi...@wstoddard.com> wrote:

> 1. Installing this filter for the duration of a connection. It is
> still a protocol filter, but it lasts for the duration of the
> connection. In order to handle pipelined connections, an

Hmm.  I'm wondering if we're mis-understanding something here.
My understanding based on this comment is that this filter will 
actually read the request line as well as the headers and body.  But, 
that's not what ap_http_filter is for - it only deals with reading 
the body of the request.  Why are we altering that?  (Why not two 
filters?)

If you do mean that all of the HTTP logic will be moving into this 
filter, a potential concern with this will be how to deal with 
protocol upgrades in the middle of a connection.

I know that there are plans to utilize the Upgrade header via a 
waka-like protocol or a TLS upgrade.  So, we should be able to 
replace the HTTP protocol handling on a per-request basis.  Making 
ap_http_filter span connections may make that unnecessarily hard. 
ap_http_filter was previously a connection filter, but it became way 
too complicated as it tried to handle what state it was in.  The code 
became impossible to deal with, so it was refactored.

I guess I'd like to see justification why the core component of a 
request should not be request-centric.  -- justin

Re: HTTP Input header filter

Posted by Bill Stoddard <bi...@wstoddard.com>.

Bill Stoddard wrote:
> Brian Pane wrote:
> 
>> Bill Stoddard wrote:
>>
>>> Just a heads up in case anyone else is interested or is comtemplating 
>>> working on
>>> this...
>>>
>>> I am rewriting much of the code called by ap_read_request to handle HTTP
>>> headers. Much of the function in rgetline_core, read_request_headers and
>>> get_mime_headers_core is being reimplemented 

Another update.. finding a bit of time to work on this.  My latest 
efforts are focused on reimplementing ap_http_filter in http_protocol.c. 
     I am modifying the ap_http_filter as follows:

1. Installing this filter for the duration of a connection. It is still 
a protocol filter, but it lasts for the duration of the connection. In 
order to handle pipelined connections, an http_header parsing filter 
must have the smarts to identify the boundary between pipelined 
requests. ap_header_filter has the smarts, so it made sense to draft it 
for parsing http headers rather than attempting to recreate all the 
logic in a standalone http header parsing filter.

2. ap_http_filter will be 100% state driven

3. It will have the ability to setaside brigades on request boundaries 
and maintain the setaside brigade in the filter context (hung off the 
connection pool)

My initial implementation of a standalone http header parsing filter 
trimmed about 2600 instructions from an http transaction.  Consolidating 
the header parsing function into ap_http_filter should save a few more 
instructions as it is installed once per connection rather than per request.

No promises on when I will have something reviewable...

Bill

Re: HTTP Input header filter

Posted by Bill Stoddard <bi...@wstoddard.com>.

Brian Pane wrote:

> Bill Stoddard wrote:
>
>> Just a heads up in case anyone else is interested or is comtemplating 
>> working on
>> this...
>>
>> I am rewriting much of the code called by ap_read_request to handle HTTP
>> headers. Much of the function in rgetline_core, read_request_headers and
>> get_mime_headers_core is being reimplemented in 
>> ap_http_headers_input_filter.
>> My goals for the rewrite are to eliminate at least 1000 instructions 
>> from the
>> mainline code path, improve readability and maintainability of the 
>> code and to
>> make doing async/non-blocking network reads a bit easier to 
>> implement.  I hope
>> to have something for review by the end of the week.
>>
>> Bill
>>  
>>
>
> +1!  According to the last batch of profile data I looked at, this part
> of the code is a prime candidate for optimization.  I'm happy to help
> test/review the changes when they're ready.
>
> Brian

Just to let you know I have not forgotten about this. I managed to spend 
some time on it today and I'm making some progress.  January is a really 
busy month for me.  Februrary will be much better for writing code.

Bill

Re: HTTP Input header filter

Posted by Brian Pane <br...@cnet.com>.

Bill Stoddard wrote:

>Just a heads up in case anyone else is interested or is comtemplating working on
>this...
>
>I am rewriting much of the code called by ap_read_request to handle HTTP
>headers. Much of the function in rgetline_core, read_request_headers and
>get_mime_headers_core is being reimplemented in ap_http_headers_input_filter.
>My goals for the rewrite are to eliminate at least 1000 instructions from the
>mainline code path, improve readability and maintainability of the code and to
>make doing async/non-blocking network reads a bit easier to implement.  I hope
>to have something for review by the end of the week.
>
>Bill
>  
>

+1!  According to the last batch of profile data I looked at, this part
of the code is a prime candidate for optimization.  I'm happy to help
test/review the changes when they're ready.

Brian