You are viewing a plain text version of this content. The canonical link for it is here.
Posted to bugs@httpd.apache.org by bu...@apache.org on 2010/12/28 21:33:48 UTC

DO NOT REPLY [Bug 35256] %2F will be decoded in PATH_INFO (Documentation to AllowEncodedSlashes says no decoding will be done)

https://issues.apache.org/bugzilla/show_bug.cgi?id=35256

--- Comment #15 from Timothy Ace <ap...@timothyace.com> 2010-12-28 15:33:41 EST ---
My company has also run into several issues with AllowEncodedSlashes already.
These issues mostly come up in cases where PATH_INFO is being used either in a
resource name for a REST API or for an asset name for a video, document, news
article, etc. that contains a slash in it's name. This makes us very invested
in this issue. Quite honestly the current implementation is wrong and violates
RFC.

Check out Example 2 from the REDUCED OR INCREASED SAFE CHARACTER SETS section
of RFC 1630:

   Example 2

   The URIs

                http://info.cern.ch/albert/bertram/marie-claude

   and

                http://info.cern.ch/albert/bertram%2Fmarie-claude

   are NOT identical, as in the second case the encoded slash does not
   have hierarchical significance.


Tim specifically called out this example in RFC 1630 and it is of great
importance to us for two reasons:

1. It shows concretely that having a %2F in the URL is valid. By having the
default behavior of httpd to reject this request with a 404 error makes it non
RFC 1630 compliant out-of-box.

2. Even it we turn on AllowEncodedSlashes, httpd interpolates the %2F as a path
separator, violating RFC 1630 because it makes the two URLs in Example 2 above
equivalent. ex. If "albert" is the name of the script or handler, then the
PATH_INFO for both URLs will be "/bertram/marie-claude" -- which is
indistinguishable from one one another, therefore making them identical.

Of note is that RFC 1630 has not been updated by or obsoleted by any other RFC
and is still the basis for URLs in WWW -- something core to httpd. 

While Section 2.4.2 of RFC 2396 (section 2.4 in RFC 3986 that obsoletes RFC
2396) mentions that a tilde (~) and a %7E can be used interchanably in a URL,
it is not pertenient to this issue since a tilde is not a "reserved character"
(specifically called out as an "unreserved character"), yet a slash (/) is
reserved.

>From Section 2.2 of RFC 3986:

     reserved    = gen-delims / sub-delims

      gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"

      sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
                  / "*" / "+" / "," / ";" / "="

   The purpose of reserved characters is to provide a set of delimiting
   characters that are distinguishable from other data within a URI.
   URIs that differ in the replacement of a reserved character with its
   corresponding percent-encoded octet are not equivalent.  PERCENT-
   ENCODING A RESERVED CHARACTER, OR DECODING A PERCENT-ENCODED OCTET
   THAT CORRESPONDS TO A RESERVED CHARACTER, WILL CHANGE HOW THE URI IS
   INTERPRETED BY MOST APPLICATIONS.  THUS, CHARACTERS IN THE RESERVED
   SET ARE PROTECTED FROM NORMALIZATION AND ARE THEREFORE SAFE TO BE
   USED BY SCHEME-SPECIFIC AND PRODUCER-SPECIFIC ALGORITHMS FOR
   DELIMITING DATA SUBCOMPONENTS WITHIN A URI.

I realize that it does say "most applications", however, it does go on in the
next statement to say that "characters in the reserved set are protected from
normalization".

Therefore the correct solution here is to change httpd to NEVER decode any of
the reserved characters from the ABNF. This would follow RFC 1630 & RFC 3986
and would also make the note in the documenation for the AllowEncodedSlashes
directive
(http://httpd.apache.org/docs/2.2/en/mod/core.html#allowencodedslashes) correct
once again in that slashes will not be decoded.

Two additional notes:

1. AllowEncodedSlashes should really be "on" by default and probably even
deprecated. From what I can tell the only thing it protects against is poor
application writers and does it in a less-than-graceful way by slapping up a
404. It also seems a very small percentage of people even know about the
AllowEncodedSlashes and those that do end up turning it on because they found
out about it because they spent a few hours scratching their head, modifying
configurations and rewrite rules trying to figure out why a valid URL was being
rejected.

2. Nowhere the RFCs is a backslash (\) listed as a reserved character.
Therefore a %5C *should* always be decoded the same as %7E is converted to a
tilde (~).

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org