You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@httpd.apache.org by Matt Liggett <ma...@socialtext.com> on 2006/10/03 19:17:57 UTC

[users@httpd] combining AllowEncodedSlashes, reverse proxy, and apache 1.x

Introduction

  At Socialtext[1], we use, in many installations, Apache 2 (hereafter
  "front end") as a server for static content and a reverse proxy for
  Apache 1 with mod_perl (hereafter "back end").  It recently came up,
  in the course of developing a REST API[2], that we need to be able
  to handle URIs with encoded '/' (%2F) characters in them[3].

  In addition to needing this all to work with Apache 2 acting as a
  front end, it also needs to work in an alternate configuration where
  Apache 1 runs alone.

AllowEncodedSlashes bug

  According to the docs[4],

    Allowing encoded slashes does not imply decoding. Occurrences of
    %2F or %5C (only on according systems) will be left as such in the
    otherwise decoded URL string.

  but it is our experience that if a URL like in [3] is passed to
  Apache 2, it gets passed to the reverse proxy as

    /data/workspaces/ambivalent/pages/either/or

  which seems to be a bug.[5]

  In addition to this, I believe it's important not to decode '%25' if
  one has AllowEncodedSlashes turned on, otherwise the URLs
  '/foo/%252F' and '/foo/%2F' become indistinguishable.[6]

  The assorted backports of AllowEncodedSlashes to Apache 1 have these
  bugs as well.

Changed URL decoding behaviour in 2.0.55.

  Prior to 2.0.55, the rewrite rule for our reverse proxy looked like

    RewriteMap escape int:escape
    RewriteRule (.*) http://BACK_END${escape:$1}

  where BACK_END is the back end hostname and port.  This was because
  the URL was getting decoded prior to this rule, and an encoded '%43'
  would become a '?', which would parse incorrectly on the back end.

  As of 2.0.55, this extra decoding seems cleaned up, _except_ for
  '%2F' if AllowEncodedSlashes is on.  That is, the bug described
  above is still present.

  As a result, it seems that if we want standard decode/escape
  sementics on the front-end, we must insist on 2.0.55+.

Do we need all this?

  It would seem that we need patched versions of Apache 2.0.55+ and Apache 1
  as described above to solve the problem in both configurations (with
  and without Apache 2 acting as reverse proxy).  Have we
  overcomplicated the problem?  If so is there a simpler combination
  of configuration, versions, or patches that accomplishes the same
  result?

  Have I misunderstood anything above?  Requiring specially patched
  versions of both Apaches is a bit of a hardship, so we want to make
  sure we aren't being super dumb here.

Thanks.
----

[1] http://www.socialtext.com/
[2] https://www.socialtext.net/st-rest-docs/index.cgi
[3] An example would be the canonical URI of a page named 'either/or'
    in the workspace 'ambivalent':
      /data/workspaces/ambivalent/pages/either%2For
[4] http://httpd.apache.org/docs/2.0/mod/core.html#allowencodedslashes
[5] I have a patch that fixes ap_unescape_url_keep2f() and can submit
    it.
[6] I have a patch for this behaviour too, but the docs would need to
    be modified if it were to be accepted.
-- 
Matt Liggett
Senior Software Engineer
Socialtext, Inc.

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] combining AllowEncodedSlashes, reverse proxy, and apache 1.x

Posted by Joshua Slive <jo...@slive.ca>.
I'm not really an expert in this stuff, but a couple comments anyway...

On 10/3/06, Matt Liggett <ma...@socialtext.com> wrote:

> AllowEncodedSlashes bug
>
>   According to the docs[4],
>
>     Allowing encoded slashes does not imply decoding. Occurrences of
>     %2F or %5C (only on according systems) will be left as such in the
>     otherwise decoded URL string.
>
>   but it is our experience that if a URL like in [3] is passed to
>   Apache 2, it gets passed to the reverse proxy as
>
>     /data/workspaces/ambivalent/pages/either/or
>
>   which seems to be a bug.[5]

I don't believe that is really a bug.  The docs mean that activating
AllowEncodedSlashes does not in itself do any decoding.  But if you
have other stuff in the works that does decoding, all bets are off.

And in general, I don't think the unescaping algorithm has a bug
either.  RFC2396 section 2.4.2 says " If the
   given URI scheme defines a canonicalization algorithm, then
   unreserved characters may be unescaped according to that algorithm."

The slash is not a reserved character and hence can be unescaped,
according to my reading.  And there are good reasons for doing just
that.

If I were you, the first thing I would try is to make your back-end
application deal with this, either by accepting a raw slash, or by
generating URLs that use some other character in place of slash.

But I have to admit that the escaping unescaping in mod_proxy and
mod_rewrite has always mystified me, and I wish it was better
documented and more configurable.

Joshua.

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org