You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by cm...@collab.net on 2003/01/13 18:55:42 UTC

URI-escaping levels, mod_rewrite, and Apache.

I'm having a doozy of time with a mod_rewrite setup, and I'd love some
extra eyes and brains.  Here's my dealio.

I have a complex setup in httpd.conf.  First, a RewriteRule is hit:

   RewriteRule /some/stuff/(.*) /some/servlet?url=/new/stuff/($1) [QSA,PT,NE,L]

Then, /some/servlet runs, and after doing what it does, it reconnects
to the HTTP server (at 127.0.0.1) using the URL in the "url=" query
param.

Finally, this second HTTP connection hits a ScriptAlias:

   ScriptAlias /new/stuff /final/uri.cgi

I am confident that the steps themselves are fine, so please try to
avoid recommandations like "simply your ruleset".  If my scenario is
revealing a real bug, let's address that, please. :-)

I'm requesting a URL:  "http://server/some/stuff/foo%25bar/"

I set a breakpoint in mod_rewrite.c:hook_uri2file(), and I see that
r->uri has already been URI-unescaped to "/some/stuff/foo%bar".  I
presume this is the server core doing this?  One of the first things
mod_rewrite does is to say:

   if (r->filename == NULL)
      copy r->uri to r->filename;

Now, my rewrite rule uses the [PT] flag, so after finishing the first
rewrite (applying the rule), mod_rewrite then does the reverse copy,
copying r->filename to r->uri.  Now, keep in mind that r->uri (and
now, r->filename) have already been URI-unescaped.

Now my servet is activated.  Once again, the URI is unescaped.  Logs
from my servlet indicate a query string of
"url=/new/stuff/fooºr".  The servlet does it's stuff, and then
performs the turnaround it's been asked to perform.  It URI-escapes
the URL, and hits the URL "http://localhost/new/stuff/foo%bar".

Finally, the ScriptAlias catches this URL, but of course it's wrong --
off by one level of URI-escaping.

I tried patching mod_rewrite so that the passthrough step re-encodes
the URI.  That worked okay for the URL we've been talking about, but
a) it's the wrong thing to do given the definition of the PT flag, and
b) isn't a complete solution.  Why (b)?  Well, because I have this
other request that I make sometimes: "http://server/some/stuff/foo%26bar/"

When I make that request, mod_rewrite is again handed an unescaped URI
"/some/stuff/foo&bar".  Ah, so when I try to re-escape it with
ap_escape_uri, it still comes out "/some/stuff/foo&bar".  Yep.
Because the '&' is valid in URIs, it doesn't get re-escaped.  So by
the time my servlet is called, it's query string has
"url=/some/stuff/foo&bar", which means the "url" param's value is
"/some/stuff/foo".  Well, that won't work.

I can't remove the [NE] from that RewriteRule because sometimes my
original URLs have query string data themselves which is combined
(using the [QSA] flag) with the servlet params.  Since query string
data is never unescaped by the server core, or by mod_rewrite, the
result would cause my original URL's query data to be doubly-escaped
(this happened in the past, is the reason why the [NE] was added in
the first place).

I'm completely out of ideas save for the one I don't want to face up
to: fixing Apache's "let's taint r->uri by unescaping it before our
handlers get a chance to see the thing" bug.

Help. ?.