You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Nick Kew <ni...@webthing.com> on 2007/10/24 03:06:34 UTC

PR#41798 - mod_proxy URL mangling

Some time ago, I posted a draft fix for PR#41798:
http://www.mail-archive.com/dev@httpd.apache.org/msg37836.html

It attracted some comments here, and needed further work.

I've written and test-driven a slightly more sophisticated patch:

* ProxyPass directive accepts an optional "nocanon" keyword,
  that tells us not to canonicalise the URL.  Without "nocanon",
  behaviour is unchanged.

* proxy_trans checks nocanon.  If set, it constructs r->filename
  from r->unparsed_uri.  To deal with Rudiger's objections to
  my previous patch, it does so only after matching the ProxyPass
  to r->uri.  If there's a mismatch between the two, indicating
  path cleaning, it issues a 301 redirect, as indicated by Roy
  for when we change a URL.

* If nocanon is set, then HTTP canonicalisation skips
  ap_proxy_canonenc.  This is in line with the forward-proxy
  fix to PR#42592.

New patch attached, soliciting review.

-- 
Nick Kew

Application Development with Apache - the Apache Modules Book
http://www.apachetutor.org/

Re: PR#41798 - mod_proxy URL mangling

Posted by Jim Jagielski <ji...@jaguNET.com>.
On Oct 25, 2007, at 10:33 AM, Nick Kew wrote:

> On Thu, 25 Oct 2007 16:00:40 +0200
> Plüm, Rüdiger, VF-Group <ru...@vodafone.com> wrote:
>
>>> However, on further reflection, I think it would be better to
>>> fall back to old behaviour in this case.  The difficulty is that
>>> this redirection catches cases where something is unnecessarily
>>> but perfectly legally URLencoded.  So a perverse client might
>>> re-encode the redirection, leaving us an infinite loop.
>>
>> What does "old behaviour" mean? Using r->uri in this case?
>
> Exactly.
>
> Any objections?
>

Conceptionally, no objections here.


Re: PR#41798 - mod_proxy URL mangling

Posted by Nick Kew <ni...@webthing.com>.
On Thu, 25 Oct 2007 16:00:40 +0200
Plüm, Rüdiger, VF-Group <ru...@vodafone.com> wrote:

> > However, on further reflection, I think it would be better to
> > fall back to old behaviour in this case.  The difficulty is that
> > this redirection catches cases where something is unnecessarily
> > but perfectly legally URLencoded.  So a perverse client might
> > re-encode the redirection, leaving us an infinite loop.
> 
> What does "old behaviour" mean? Using r->uri in this case?

Exactly.

Any objections?

-- 
Nick Kew

Application Development with Apache - the Apache Modules Book
http://www.apachetutor.org/

Re: PR#41798 - mod_proxy URL mangling

Posted by Plüm, Rüdiger, VF-Group <ru...@vodafone.com>.

> -----Ursprüngliche Nachricht-----
> Von: Nick Kew 
> Gesendet: Donnerstag, 25. Oktober 2007 14:30
> An: dev@httpd.apache.org
> Betreff: Re: PR#41798 - mod_proxy URL mangling
> 
> 
> On Thu, 25 Oct 2007 12:02:21 +0200
> Plüm, Rüdiger, wrote:

> 
> However, on further reflection, I think it would be better to
> fall back to old behaviour in this case.  The difficulty is that
> this redirection catches cases where something is unnecessarily
> but perfectly legally URLencoded.  So a perverse client might
> re-encode the redirection, leaving us an infinite loop.

What does "old behaviour" mean? Using r->uri in this case?

Regards

Rüdiger

Re: PR#41798 - mod_proxy URL mangling

Posted by Nick Kew <ni...@webthing.com>.
On Thu, 25 Oct 2007 12:02:21 +0200
Plüm, Rüdiger, VF-Group <ru...@vodafone.com> wrote:

> I guess 
> 
> use_uri = ap_construct_url(r->pool, r->uri, r);

Thanks - I thought I knew there was some such API function, but
didn't find it at the time.

However, on further reflection, I think it would be better to
fall back to old behaviour in this case.  The difficulty is that
this redirection catches cases where something is unnecessarily
but perfectly legally URLencoded.  So a perverse client might
re-encode the redirection, leaving us an infinite loop.

> I am missing a minor bump since mod_proxy.h is public.

Fairy nuff.

> Otherwise looks good from first glance.

Thanks.

-- 
Nick Kew

Application Development with Apache - the Apache Modules Book
http://www.apachetutor.org/

Re: PR#41798 - mod_proxy URL mangling

Posted by Plüm, Rüdiger, VF-Group <ru...@vodafone.com>.

> -----Ursprüngliche Nachricht-----
> Von: Nick Kew 
> Gesendet: Mittwoch, 24. Oktober 2007 03:07
> An: dev@httpd.apache.org
> Betreff: PR#41798 - mod_proxy URL mangling
> 
> 
> Some time ago, I posted a draft fix for PR#41798:
> http://www.mail-archive.com/dev@httpd.apache.org/msg37836.html
> 
> It attracted some comments here, and needed further work.
> 
> I've written and test-driven a slightly more sophisticated patch:
> 
> * ProxyPass directive accepts an optional "nocanon" keyword,
>   that tells us not to canonicalise the URL.  Without "nocanon",
>   behaviour is unchanged.
> 
> * proxy_trans checks nocanon.  If set, it constructs r->filename
>   from r->unparsed_uri.  To deal with Rudiger's objections to
>   my previous patch, it does so only after matching the ProxyPass
>   to r->uri.  If there's a mismatch between the two, indicating
>   path cleaning, it issues a 301 redirect, as indicated by Roy
>   for when we change a URL.
> 
> * If nocanon is set, then HTTP canonicalisation skips
>   ap_proxy_canonenc.  This is in line with the forward-proxy
>   fix to PR#42592.
> 
> New patch attached, soliciting review.


Index: modules/proxy/mod_proxy.c
===================================================================
--- modules/proxy/mod_proxy.c	(revision 587155)
+++ modules/proxy/mod_proxy.c	(working copy)
@@ -537,27 +540,36 @@
                 if ((real[0] == '!') && (real[1] == '\0')) {
                     return DECLINED;
                 }
-                found = ap_pregsub(r->pool, real, r->uri, AP_MAX_REG_MATCH,
-                                   regm);
-                /* Note: The strcmp() below catches cases where there
-                 * was no regex substitution. This is so cases like:
-                 *
-                 *    ProxyPassMatch \.gif balancer://foo
-                 *
-                 * will work "as expected". The upshot is that the 2
-                 * directives below act the exact same way (ie: $1 is implied):
-                 *
-                 *    ProxyPassMatch ^(/.*\.gif)$ balancer://foo
-                 *    ProxyPassMatch ^(/.*\.gif)$ balancer://foo$1
-                 *
-                 * which may be confusing.
-                 */
-                if (found && strcmp(found, real)) {
-                    found = apr_pstrcat(r->pool, "proxy:", found, NULL);
+                /* test that we haven't reduced the URI */
+                if (ent[i].nocanon
+                    && ap_regexec(ent[i].regex, r->unparsed_uri,
+                                  AP_MAX_REG_MATCH, reg1, 0)) {
+                    mismatch = 1;
                 }
                 else {
-                    found = apr_pstrcat(r->pool, "proxy:", real, r->uri,
-                                        NULL);
+                    found = ap_pregsub(r->pool, real, use_uri,
+                                       AP_MAX_REG_MATCH,
+                                       use_uri == r->uri ? regm : reg1);

Why not  ent[i].nocanon ? reg1 : regm  ??

+                    /* Note: The strcmp() below catches cases where there
+                     * was no regex substitution. This is so cases like:
+                     *
+                     *    ProxyPassMatch \.gif balancer://foo
+                     *
+                     * will work "as expected". The upshot is that the 2
+                     * directives below act the exact same way (ie: $1 is implied):
+                     *
+                     *    ProxyPassMatch ^(/.*\.gif)$ balancer://foo
+                     *    ProxyPassMatch ^(/.*\.gif)$ balancer://foo$1
+                     *
+                     * which may be confusing.
+                     */
+                    if (found && strcmp(found, real)) {
+                        found = apr_pstrcat(r->pool, "proxy:", found, NULL);
+                    }
+                    else {
+                        found = apr_pstrcat(r->pool, "proxy:", real,
+                                            use_uri, NULL);
+                    }
                 }
             }
         }
@@ -568,16 +580,37 @@
                 if ((real[0] == '!') && (real[1] == '\0')) {
                     return DECLINED;
                 }
-
-                found = apr_pstrcat(r->pool, "proxy:", real,
-                                    r->uri + len, NULL);
-
+                if (ent[i].nocanon
+                    && len != alias_match(r->unparsed_uri, ent[i].fake)) {
+                    mismatch = 1;
+                }
+                else {
+                    found = apr_pstrcat(r->pool, "proxy:", real,
+                                        use_uri + len, NULL);
+                }
             }
         }
+        if (mismatch) {
+            /* We made a reducing transformation, so we can't safely use
+             * unparsed_uri.  Send a redirect to the canonicalised URI
+             * instead.  FIXME: this breaks if protocol isn't http.
+             */
+            use_uri = apr_pstrcat(r->pool, "http://",
+                          r->hostname?r->hostname:r->server->server_hostname,
+                          ":", r->parsed_uri.port?r->parsed_uri.port_str:"80",
+                          r->uri, NULL);

I guess 

use_uri = ap_construct_url(r->pool, r->uri, r);

could help to address your FIXME.

Index: modules/proxy/mod_proxy.h
===================================================================
--- modules/proxy/mod_proxy.h	(revision 587155)
+++ modules/proxy/mod_proxy.h	(working copy)
@@ -113,6 +113,7 @@
     const char  *real;
     const char  *fake;
     ap_regex_t  *regex;
+    int nocanon;
 };
 
 struct dirconn_entry {

I am missing a minor bump since mod_proxy.h is public.

Otherwise looks good from first glance.

Regards

Rüdiger