You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by William A Rowe Jr <wr...@rowe-clan.net> on 2016/05/17 16:43:07 UTC

RFC 7230..7235 Parsing Conformance?

Wondering what other contributors are thinking on this topic.

We have a number of changes in the ABNF grammar between
RFC2616 and RFC7230..7235. Do we want trunk 2.6/3.0 to be
an entirely RFC723x generation server, and drop all support for
RFC2616?

Do we want to backport these changes to 2.4.x? If so, what
mechanism do we want to toggle the behavior of the server
between 2616 and 7230..7235?

We can presume a small performance hit in any conditional
operation, especially when those decisions apply to tight parsing
loop. Toggling between two different parser implementations would
probably be a bit more efficient than conditionals within a parser
itself.

Re: RFC 7230..7235 Parsing Conformance?

Posted by Eric Covener <co...@gmail.com>.
On Tue, May 17, 2016 at 9:43 AM, William A Rowe Jr <wr...@rowe-clan.net> wrote:
> Do we want to backport these changes to 2.4.x? If so, what
> mechanism do we want to toggle the behavior of the server
> between 2616 and 7230..7235?

I would piggyback it on the "HttpProtocol" strict stuff that also
needs backporting.

Re: RFC 7230..7235 Parsing Conformance?

Posted by Jacob Champion <ch...@gmail.com>.
On 05/17/2016 02:53 PM, William A Rowe Jr wrote:
> (Note that HT is a CTL, right, so it appears to be doubly excluded, no?)
> CHAR is US-ASCII 0-127.

I noticed that too... It seems odd, but it's water under the bridge now,
I guess.

> The characters missing above from tchar are '"', '(', ')', ',', '/',
> ':', ';', '<', '=', '>', '?', '@', '[', '\', ']', '{', '}' which
> corresponds to this delimiter list, and to the RFC2616 list. VCHAR is
> clearly US-ASCII 20-7E, possibly includes tab. (Tabs are visible spacing.)

RFC 7230 defers to RFC 5234 [1] for VCHAR:

>          VCHAR          =  %x21-7E
>                                 ; visible (printing) characters

So neither spaces nor tabs by my reading.

> So my concerns may have been unfounded, but reviewing the new spec
> against implementation still seems prudent.

+1 to that.

Getting back to your original question, then: if there are any
differences that do come up, I personally think it would be nice if the
new rules were used by default in the next major release, but
case-by-case consideration would probably be appropriate for now.

--Jacob

[1] https://tools.ietf.org/html/rfc5234#appendix-B.1

Re: RFC 7230..7235 Parsing Conformance?

Posted by William A Rowe Jr <wr...@rowe-clan.net>.
Based on Jason's question...

On Tue, May 17, 2016 at 1:31 PM, William A Rowe Jr <wr...@rowe-clan.net>
wrote:

> On Tue, May 17, 2016 at 1:00 PM, Julian Reschke <ju...@gmx.de>
> wrote:
>
>> On 2016-05-17 19:01, Graham Leggett wrote:
>>
>>> On 17 May 2016, at 6:43 PM, William A Rowe Jr <wr...@rowe-clan.net>
>>> wrote:
>>>
>>> Wondering what other contributors are thinking on this topic.
>>>>
>>>> We have a number of changes in the ABNF grammar between
>>>> RFC2616 and RFC7230..7235. Do we want trunk 2.6/3.0 to be
>>>> an entirely RFC723x generation server, and drop all support for
>>>> RFC2616?
>>>>
>>>> Do we want to backport these changes to 2.4.x? If so, what
>>>> mechanism do we want to toggle the behavior of the server
>>>> between 2616 and 7230..7235?
>>>>
>>>> We can presume a small performance hit in any conditional
>>>> operation, especially when those decisions apply to tight parsing
>>>> loop. Toggling between two different parser implementations would
>>>> probably be a bit more efficient than conditionals within a parser
>>>> itself.
>>>>
>>>
>>> Can you give some examples to get a sense of the extent of this?
>>>
>> +1 to the question; I'd like to see examples as well...
>>
>> I believe we only changed the ABNF when we came to the conclusion that
>> the old one was incorrect, or did not reflect what implementations do in
>> practice.
>>
>
> One of the more significant is the change to token,
> https://tools.ietf.org/html/rfc2616#section-2.2
>
>      token          = 1*<any CHAR except CTLs or separators>
>
>
       separators     = "(" | ")" | "<" | ">" | "@"
                      | "," | ";" | ":" | "\" | <">
                      | "/" | "[" | "]" | "?" | "="
                      | "{" | "}" | SP | HT


(Note that HT is a CTL, right, so it appears to be doubly excluded, no?)
CHAR is US-ASCII 0-127.


> vs https://tools.ietf.org/html/rfc7230#section-3.2.6
>
>      token          = 1*tchar
>
>      tchar          = "!" / "#" / "$" / "%" / "&" / "'" / "*"
>                     / "+" / "-" / "." / "^" / "_" / "`" / "|" / "~"
>                     / DIGIT / ALPHA
>                     ; any VCHAR, except delimiters
>
>
  "Delimiters are chosen
   from the set of US-ASCII visual characters not allowed in a token
   (DQUOTE and "(),/:;<=>?@[\]{}")."


The characters missing above from tchar are '"', '(', ')', ',', '/', ':',
';', '<', '=', '>', '?', '@', '[', '\', ']', '{', '}' which corresponds to
this delimiter list, and to the RFC2616 list. VCHAR is clearly US-ASCII
20-7E, possibly includes tab. (Tabs are visible spacing.)

So my concerns may have been unfounded, but reviewing the new spec against
implementation still seems prudent. As I come across specifics we can
discuss those, sorry for my confusion.

Re: RFC 7230..7235 Parsing Conformance?

Posted by Jacob Champion <ch...@gmail.com>.
On 05/17/2016 11:31 AM, William A Rowe Jr wrote:
> One of the more significant is the change to token,
> https://tools.ietf.org/html/rfc2616#section-2.2
> 
> token = 1*<any CHAR except CTLs or separators>
> 
> 
> vs https://tools.ietf.org/html/rfc7230#section-3.2.6
> 
>      token          = 1*tchar
> 
>      tchar          = "!" / "#" / "$" / "%" / "&" / "'" / "*"
>                     / "+" / "-" / "." / "^" / "_" / "`" / "|" / "~"
>                     / DIGIT / ALPHA
>                     ; any VCHAR, except delimiters

Is there a difference between these, other than that the first
definition is subtractive and the second is additive? I did a quick
check through an ASCII table and they seemed to be equivalent to me.

--Jacob


Re: RFC 7230..7235 Parsing Conformance?

Posted by William A Rowe Jr <wr...@rowe-clan.net>.
On Tue, May 17, 2016 at 1:00 PM, Julian Reschke <ju...@gmx.de>
wrote:

> On 2016-05-17 19:01, Graham Leggett wrote:
>
>> On 17 May 2016, at 6:43 PM, William A Rowe Jr <wr...@rowe-clan.net>
>> wrote:
>>
>> Wondering what other contributors are thinking on this topic.
>>>
>>> We have a number of changes in the ABNF grammar between
>>> RFC2616 and RFC7230..7235. Do we want trunk 2.6/3.0 to be
>>> an entirely RFC723x generation server, and drop all support for
>>> RFC2616?
>>>
>>> Do we want to backport these changes to 2.4.x? If so, what
>>> mechanism do we want to toggle the behavior of the server
>>> between 2616 and 7230..7235?
>>>
>>> We can presume a small performance hit in any conditional
>>> operation, especially when those decisions apply to tight parsing
>>> loop. Toggling between two different parser implementations would
>>> probably be a bit more efficient than conditionals within a parser
>>> itself.
>>>
>>
>> Can you give some examples to get a sense of the extent of this?
>>
> +1 to the question; I'd like to see examples as well...
>
> I believe we only changed the ABNF when we came to the conclusion that the
> old one was incorrect, or did not reflect what implementations do in
> practice.
>

One of the more significant is the change to token,
https://tools.ietf.org/html/rfc2616#section-2.2

     token          = 1*<any CHAR except CTLs or separators>


vs https://tools.ietf.org/html/rfc7230#section-3.2.6

     token          = 1*tchar

     tchar          = "!" / "#" / "$" / "%" / "&" / "'" / "*"
                    / "+" / "-" / "." / "^" / "_" / "`" / "|" / "~"
                    / DIGIT / ALPHA
                    ; any VCHAR, except delimiters


This has a lot of tangential effects. My plan was to begin auditing
the code against spec and begin assembling patches. My question
in the note above is how we will generally handle the shift from one
spec to the next, and what we will promise our users (use 2.next/3.0
for conformance? Use 2.4 to retain 2616 behavior?)

Re: RFC 7230..7235 Parsing Conformance?

Posted by Julian Reschke <ju...@gmx.de>.
On 2016-05-17 19:01, Graham Leggett wrote:
> On 17 May 2016, at 6:43 PM, William A Rowe Jr <wr...@rowe-clan.net> wrote:
>
>> Wondering what other contributors are thinking on this topic.
>>
>> We have a number of changes in the ABNF grammar between
>> RFC2616 and RFC7230..7235. Do we want trunk 2.6/3.0 to be
>> an entirely RFC723x generation server, and drop all support for
>> RFC2616?
>>
>> Do we want to backport these changes to 2.4.x? If so, what
>> mechanism do we want to toggle the behavior of the server
>> between 2616 and 7230..7235?
>>
>> We can presume a small performance hit in any conditional
>> operation, especially when those decisions apply to tight parsing
>> loop. Toggling between two different parser implementations would
>> probably be a bit more efficient than conditionals within a parser
>> itself.
>
> Can you give some examples to get a sense of the extent of this?
>
> Regards,
> Graham

+1 to the question; I'd like to see examples as well...

I believe we only changed the ABNF when we came to the conclusion that 
the old one was incorrect, or did not reflect what implementations do in 
practice.

Best regards, Julian


Re: RFC 7230..7235 Parsing Conformance?

Posted by Graham Leggett <mi...@sharp.fm>.
On 17 May 2016, at 6:43 PM, William A Rowe Jr <wr...@rowe-clan.net> wrote:

> Wondering what other contributors are thinking on this topic.
> 
> We have a number of changes in the ABNF grammar between
> RFC2616 and RFC7230..7235. Do we want trunk 2.6/3.0 to be 
> an entirely RFC723x generation server, and drop all support for 
> RFC2616? 
> 
> Do we want to backport these changes to 2.4.x? If so, what
> mechanism do we want to toggle the behavior of the server
> between 2616 and 7230..7235? 
> 
> We can presume a small performance hit in any conditional 
> operation, especially when those decisions apply to tight parsing 
> loop. Toggling between two different parser implementations would
> probably be a bit more efficient than conditionals within a parser
> itself.

Can you give some examples to get a sense of the extent of this?

Regards,
Graham
—