You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by William A Rowe Jr <wr...@rowe-clan.net> on 2016/09/12 14:50:47 UTC

Re: Backporting HttpProtocolOptions survey

On Mon, Aug 29, 2016 at 1:04 PM, Ruediger Pluem <rp...@apache.org> wrote:

>
> On 08/29/2016 06:25 PM, William A Rowe Jr wrote:
> > Thanks all for the feedback. Status and follow-up questions inline
> >
> > On Thu, Aug 25, 2016 at 10:02 PM, William A Rowe Jr <wrowe@rowe-clan.net
> <ma...@rowe-clan.net>> wrote:
> >
> >     A couple key questions now that the full refactoring of legacy vs.
> strict is mostly complete (there remain potential
> >     issues with some of the 3-4 yr old changes on trunk which I'll raise
> in other posts.) But speaking only to the
> >     request line and request header parsing...
> >
> >     3. Do we need multiple
> >     layers of 'Strict'ness, or should there be a single toggle, or no
> toggle, no tolerant input at all in the next
> >     2.2/2.4 releases?
> >
> > Discussion item:
> >
> > I am not sold that StrictURI can be collapsed into this flag. Right now,
> not even
> > httpd itself promises to correctly encode resulting URI's, AIUI. Until
> we have our
> > own house in order, it seems we need to remain flexible about this. The
> \t\v\r\f\0
> > characters are always now prohibited, so it's considerably more safe.
> Strict further
> > bans all unencoded ctrl's in the URL. So StrictURI takes this one step
> further, and
> > bans all unencoded obs-text along with SP / '"' / '<' / '>' / '\' / '^'
> / '`' / '{' / '|' / '}'
> >
> > Since it's expected that a number of sites will have to relax UnsafeURI
> due to
> > these encoding issues, even with the resulting URI's generated by httpd
> servers,
> > and will have to do so for *public facing* interfaces, I strongly
> believe that this
> > flag  needs to remain distinct, or we will have lots of servers with
> entirely unsafe
> > parsing, not with only limited exposure by accepting bad URIs. Thoughts?
>
> Given the situation you describe it sounds sensible to keep this distinct.



> >
> >     4. Should the next 2.4/2.2 releases default to Strict at all? Or
> remain permissive (Unsafe) and allow the user to
> >     toggle these to Strict(... Whitespace... URI)?
> >
> >     Real world direct observation especially appreciated from actual
> deployments.
> >
> > Strict (and StrictURI) remain the default. The Allow0.9 and
> LenientMethods
>
> StrictURI as a default only makes sense if we have our own house in order
> (see above), otherwise it should be opt in.
>

Relative to our own house, I discovered that ';' is currently in the list
of those characters we insist on encoding. According to RFC3986,
while ';' has a special meaning, and as a sub-delim there is a potentially
distinct value of %3B... our own behavior seems entirely broken.

When we receive either ';' or %3B it is decoded to ';' in our r->uri, as we
decode all of the pct-escaped chars.  Take this example;

http://example.com/foo;enc=en%3Bus/test.html

(Ignore the fact that my example has a much more logical way to choose
the language variant of a foo path segment).

r->uri becomes /foo;enc=en;us/test.html - this is part of the path to the
file system we will look for. The httpd server has no special logic to
handle
such args or properties as described in the last paragraph of section 3.3,
they aren't an httpd consideration at all.

But worse, if the character ';' is distinct from %3B at the origin server,
we
are passing a proxied path of /foo%3Benc=en%3Bus/test.html - and also
returning redirects in that form. The character ';' is smashed and cannot
be recovered, although it's allowed by section 3.3, and it's plain-text
meaning is both more prevalent and more often correct.

Seems we are better off handing back or handing off /foo;enc=en;us/test.html
than the current /foo%3Benc=en%3Bus/test.html - the current behavior
is a much more pervasive error than the second case of embedded %3B
within a ';' section.

This appears to be our only mis-encoding from path to a composed URI.
The rest of this logic reflects section 3.3 rules, so I'll commit this
addition
shortly to this allowed (don't-escape) list, and we will still accept either
form on input;

if (!apr_isalnum(c) && !strchr("$-_.+!*'(),:@&=/~", c)) {
    flags |= T_OS_ESCAPE_PATH;
}