You are viewing a plain text version of this content. The canonical link for it is here.
Posted to cvs@httpd.apache.org by yl...@apache.org on 2020/06/22 10:36:55 UTC

svn commit: r1879079 - /httpd/httpd/trunk/server/request.c

Author: ylavic
Date: Mon Jun 22 10:36:55 2020
New Revision: 1879079

URL: http://svn.apache.org/viewvc?rev=1879079&view=rev
Log:
Allow for URI-path pre_translate_name before (and/or instead of) decoding.

Apply minimal normalization (AP_NORMALIZE_DECODE_UNRESERVED) first in
ap_process_request_internal() before running pre_translate_name hooks,
such that the hooks can work with undecoded r->uri.

Only if no hook takes "ownership" of the URI (returning OK), apply
percent decoding for the rest of request handling. Otherwise r->uri remains
encoded meaning that further location/directory/file/if/.. sections (walks)
should that into account.

Since normalization now happens before decoding, we might have to
re-normalize after decoding if "AllowEncodedSlahes on" transformed any
"%2F" sequence to "/", potentially creating new "/./" or "/../" sequences.

Note that for (lookup) subrequests, the path may be relative so we have
to allow for that.


Modified:
    httpd/httpd/trunk/server/request.c

Modified: httpd/httpd/trunk/server/request.c
URL: http://svn.apache.org/viewvc/httpd/httpd/trunk/server/request.c?rev=1879079&r1=1879078&r2=1879079&view=diff
==============================================================================
--- httpd/httpd/trunk/server/request.c (original)
+++ httpd/httpd/trunk/server/request.c Mon Jun 22 10:36:55 2020
@@ -172,12 +172,41 @@ AP_DECLARE(int) ap_process_request_inter
     core_dir_config *d;
     core_server_config *sconf =
         ap_get_core_module_config(r->server->module_config);
+    unsigned int normalize_flags = 0;
 
-    /* Ignore embedded %2F's in path for proxy requests */
-    if (!r->proxyreq && r->parsed_uri.path) {
+    if (r->main) {
+        /* Lookup subrequests can have a relative path. */
+        normalize_flags = AP_NORMALIZE_ALLOW_RELATIVE;
+    }
+
+    if (r->parsed_uri.path) {
+        /* Normalize: remove /./ and shrink /../ segments, plus
+         * decode unreserved chars (first time only to avoid
+         * double decoding after ap_unescape_url() below).
+         */
+        if (!ap_normalize_path(r->parsed_uri.path,
+                               normalize_flags |
+                               AP_NORMALIZE_DECODE_UNRESERVED)) {
+            ap_log_rerror(APLOG_MARK, APLOG_ERR, 0, r, APLOGNO()
+                          "invalid URI path (%s)", r->uri);
+            return HTTP_BAD_REQUEST;
+        }
+    }
+
+    /* Let pre_translate_name hooks work with non-decoded URIs,
+     * and eventually apply their own transformations (return OK).
+     */
+    access_status = ap_run_pre_translate_name(r);
+    if (access_status != OK && access_status != DECLINED) {
+        return access_status;
+    }
+
+    /* Ignore URL unescaping for translated URIs already */
+    if (access_status == DECLINED && r->parsed_uri.path) {
         d = ap_get_core_module_config(r->per_dir_config);
         if (d->allow_encoded_slashes) {
-            access_status = ap_unescape_url_keep2f(r->parsed_uri.path, d->decode_encoded_slashes);
+            access_status = ap_unescape_url_keep2f(r->parsed_uri.path,
+                                                   d->decode_encoded_slashes);
         }
         else {
             access_status = ap_unescape_url(r->parsed_uri.path);
@@ -188,20 +217,27 @@ AP_DECLARE(int) ap_process_request_inter
                     ap_log_rerror(APLOG_MARK, APLOG_INFO, 0, r, APLOGNO(00026)
                                   "found %%2f (encoded '/') in URI "
                                   "(decoded='%s'), returning 404",
-                                  r->parsed_uri.path);
+                                  r->uri);
                 }
             }
             return access_status;
         }
-    }
 
-    ap_getparents(r->uri);     /* OK --- shrinking transformations... */
-    if (sconf->merge_slashes != AP_CORE_CONFIG_OFF) { 
-        ap_no2slash(r->uri);
-        if (r->parsed_uri.path) {
+        if (d->allow_encoded_slashes && d->decode_encoded_slashes) {
+            /* Decoding slashes might have created new /./ and /../
+             * segments (e.g. "/.%2F/"), so re-normalize. If asked to,
+             * merge slashes while at it.
+             */
+            if (sconf->merge_slashes != AP_CORE_CONFIG_OFF) { 
+                normalize_flags |= AP_NORMALIZE_MERGE_SLASHES;
+            }
+            ap_normalize_path(r->parsed_uri.path, normalize_flags);
+        }
+        else if (sconf->merge_slashes != AP_CORE_CONFIG_OFF) { 
+            /* We still didn't merged slashes yet, do it now. */
             ap_no2slash(r->parsed_uri.path);
         }
-     }
+    }
 
     /* All file subrequests are a huge pain... they cannot bubble through the
      * next several steps.  Only file subrequests are allowed an empty uri,



Re: pre_translate_name hook vs location/directory walk (was: svn commit: r1879079)

Posted by Yann Ylavic <yl...@gmail.com>.
On Mon, Jun 22, 2020 at 2:28 PM Yann Ylavic <yl...@gmail.com> wrote:
>
> Also, since we are at it, I'm on the fence about running the
> pre_translate hooks before quick handlers (thus before
> ap_process_request_internal() too), good or bad idea?

Sorry, I meant running the *normalization* (including decoding
unreserved characters) before the quick handlers.

Re: pre_translate_name hook vs location/directory walk (was: svn commit: r1879079)

Posted by Yann Ylavic <yl...@gmail.com>.
On Mon, Jun 22, 2020 at 2:57 PM Eric Covener <co...@gmail.com> wrote:
>
> On Mon, Jun 22, 2020 at 8:28 AM Yann Ylavic <yl...@gmail.com> wrote:
> >
> > > Allow for URI-path pre_translate_name before (and/or instead of) decoding.
> > >
> > > Only if no hook takes "ownership" of the URI (returning OK), apply
> > > percent decoding for the rest of request handling. Otherwise r->uri remains
> > > encoded meaning that further location/directory/file/if/.. sections (walks)
> > > should that into account.
> >
> > This change allows a module to avoid URI decoding in
> > pre_translate_name, such that r->uri remains encoded for the whole
> > request handling.
> >
> > For instance (same series), mod_proxy will use that hook when
> > "ProxyMappingDecoded off" is configured, and return OK if a ProxyPass
> > matches. The decoding was already disabled (before this commit) for
> > the forward-proxy case (ProxyRequests on) since r->proxyreq was set at
> > an early stage, but now it "works" for the reverse-proxy case too.
> >
> > This means that when "ProxyMappingDecoded off" is configured (the
> > default is still "on", though), for instance a <Location URI-path, or
> > a fixup RewriteRule or a non-early RequestHeader rule will have to be
> > expressed with the encoded form since reserved characters will not be
> > decoded as before (note that unreserved characters will always be
> > decoded anyway).
> >
> > I don't know how much it can break existing configurations, and wonder
> > if we should have a core directive that still controls whether r->uri
> > should be decoded, regardless of ProxyMappingDecoded or any further
> > module pre_translate_name hook with its own directives.
> >
> > For mod_proxy this is not really an issue to have r->uri modified
> > after the (pre_)translate phase, because the handler will later work
> > on r->filename regardless (the "proxy:..." set by proxy_detect or
> > proxy_trans), so maybe we should simply always decode r->uri after the
> > pre_translate hooks have run. Though I find it easier in a proxy
> > context to match a URI-path (LocationMatch, RewriteRule..) in its
> > encoded form, and not worry about original vs decoded separators. That
> > may be just my preference, but a new core directive looks sane to me..
> >
> > Thoughts?
>
> I have not followed too closely so take with a grain of salt / for the
> sake of argument:

Thanks, always welcome!

>
> From an externals perspective (maybe being detached here is a benefit)
> it might be better to have something like "ProxyUseOriginalURL" that

Yeah, much better name. Will use *URI which is somewhat closer to the
RFC's URI-path nomenclature.

> has the following caveats:
> 1. it's partially decoded
> 2. it is required for whatever the servlet mapping option is (iiuc?)

Correct, I'm a bit reluctant to issue servlet mapping on decoded URIs..

> 3. it will affect the string used in comparisons/substitutions for
> <Location*>, Rewrite, <If>, (*CAUTION*)
>    - Some details is needed here about the partial decoding, for
> example %2f or multiple leading slashes so "simple" context root
> configs can be written without too much paranoia
> 4. If there are risks of re-injecting URL's with [PT] or per-dir
> rewrites try to explain that it's a complex combination to worry
> about.

Possibly we are better off decoding r->uri in any case then, and not
change its current semantics.
As I said it won't affect the mod_proxy pre_trans case since its
translation already occurred and was saved in r->filename.
If we later want to let the user work with either the decoded or
encoded URI then we will add r->encoded_uri or alike, with some
opacity and helpers to keep it in sync with r->uri. But that would
affect module writers too, that's why I was thinking more of a "core"
directive than ProxyUseOriginalURI, or no directive at all with
trunk/2.next/3 material only.

>
> I would not worry too much about "existing configs" as this is all
> opt-in and it's probably a lost cause anyway.
>
> I would instead give people new built-in vars in the important places
> (rewrite, expr) to be able to match the partially decoded URI that the
> proxy will be working on so a savvy user can safely interoperate
> without monstrous regexes.

The hard part here (IMHO) is to keep encoded_uri and decoded_uri in
sync, depending on whichever is updated/accessed by a module or e.g. a
RewriteRule. It may be too much of an API/ABI change for the simple
proxy encoded URI case.

For now we can probably live with r->uri being always
normalized/decoded as before (and <Location*, RewriteRule, ... working
as today), even though it does not allow for encoded control chars
matching or things like that, and requires to be careful with decoded
reserved chars (non default for slash fortunately) or other
separators, while still allowing to match them without a PhD in
regexes.
We can't simply have the best of both worlds I'm afraid..

Re: pre_translate_name hook vs location/directory walk (was: svn commit: r1879079)

Posted by Eric Covener <co...@gmail.com>.
On Mon, Jun 22, 2020 at 8:28 AM Yann Ylavic <yl...@gmail.com> wrote:
>
> > Allow for URI-path pre_translate_name before (and/or instead of) decoding.
> >
> > Only if no hook takes "ownership" of the URI (returning OK), apply
> > percent decoding for the rest of request handling. Otherwise r->uri remains
> > encoded meaning that further location/directory/file/if/.. sections (walks)
> > should that into account.
>
> This change allows a module to avoid URI decoding in
> pre_translate_name, such that r->uri remains encoded for the whole
> request handling.
>
> For instance (same series), mod_proxy will use that hook when
> "ProxyMappingDecoded off" is configured, and return OK if a ProxyPass
> matches. The decoding was already disabled (before this commit) for
> the forward-proxy case (ProxyRequests on) since r->proxyreq was set at
> an early stage, but now it "works" for the reverse-proxy case too.
>
> This means that when "ProxyMappingDecoded off" is configured (the
> default is still "on", though), for instance a <Location URI-path, or
> a fixup RewriteRule or a non-early RequestHeader rule will have to be
> expressed with the encoded form since reserved characters will not be
> decoded as before (note that unreserved characters will always be
> decoded anyway).
>
> I don't know how much it can break existing configurations, and wonder
> if we should have a core directive that still controls whether r->uri
> should be decoded, regardless of ProxyMappingDecoded or any further
> module pre_translate_name hook with its own directives.
>
> For mod_proxy this is not really an issue to have r->uri modified
> after the (pre_)translate phase, because the handler will later work
> on r->filename regardless (the "proxy:..." set by proxy_detect or
> proxy_trans), so maybe we should simply always decode r->uri after the
> pre_translate hooks have run. Though I find it easier in a proxy
> context to match a URI-path (LocationMatch, RewriteRule..) in its
> encoded form, and not worry about original vs decoded separators. That
> may be just my preference, but a new core directive looks sane to me..
>
> Thoughts?

I have not followed too closely so take with a grain of salt / for the
sake of argument:

From an externals perspective (maybe being detached here is a benefit)
it might be better to have something like "ProxyUseOriginalURL" that
has the following caveats:
1. it's partially decoded
2. it is required for whatever the servlet mapping option is (iiuc?)
3. it will affect the string used in comparisons/substitutions for
<Location*>, Rewrite, <If>, (*CAUTION*)
   - Some details is needed here about the partial decoding, for
example %2f or multiple leading slashes so "simple" context root
configs can be written without too much paranoia
4. If there are risks of re-injecting URL's with [PT] or per-dir
rewrites try to explain that it's a complex combination to worry
about.

I would not worry too much about "existing configs" as this is all
opt-in and it's probably a lost cause anyway.

I would instead give people new built-in vars in the important places
(rewrite, expr) to be able to match the partially decoded URI that the
proxy will be working on so a savvy user can safely interoperate
without monstrous regexes.

pre_translate_name hook vs location/directory walk (was: svn commit: r1879079)

Posted by Yann Ylavic <yl...@gmail.com>.
> Allow for URI-path pre_translate_name before (and/or instead of) decoding.
>
> Only if no hook takes "ownership" of the URI (returning OK), apply
> percent decoding for the rest of request handling. Otherwise r->uri remains
> encoded meaning that further location/directory/file/if/.. sections (walks)
> should that into account.

This change allows a module to avoid URI decoding in
pre_translate_name, such that r->uri remains encoded for the whole
request handling.

For instance (same series), mod_proxy will use that hook when
"ProxyMappingDecoded off" is configured, and return OK if a ProxyPass
matches. The decoding was already disabled (before this commit) for
the forward-proxy case (ProxyRequests on) since r->proxyreq was set at
an early stage, but now it "works" for the reverse-proxy case too.

This means that when "ProxyMappingDecoded off" is configured (the
default is still "on", though), for instance a <Location URI-path, or
a fixup RewriteRule or a non-early RequestHeader rule will have to be
expressed with the encoded form since reserved characters will not be
decoded as before (note that unreserved characters will always be
decoded anyway).

I don't know how much it can break existing configurations, and wonder
if we should have a core directive that still controls whether r->uri
should be decoded, regardless of ProxyMappingDecoded or any further
module pre_translate_name hook with its own directives.

For mod_proxy this is not really an issue to have r->uri modified
after the (pre_)translate phase, because the handler will later work
on r->filename regardless (the "proxy:..." set by proxy_detect or
proxy_trans), so maybe we should simply always decode r->uri after the
pre_translate hooks have run. Though I find it easier in a proxy
context to match a URI-path (LocationMatch, RewriteRule..) in its
encoded form, and not worry about original vs decoded separators. That
may be just my preference, but a new core directive looks sane to me..

Thoughts?


Also, since we are at it, I'm on the fence about running the
pre_translate hooks before quick handlers (thus before
ap_process_request_internal() too), good or bad idea?


Regards;
Yann.