You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Rodent of Unusual Size <Ke...@Golux.Com> on 2001/03/22 16:51:31 UTC

Deficiency in mod_rewrite

I just realised that mod_rewrite has a fairly significant
deficiency: since it uses '%' and '$' for its own purposes
in signalling substitution, there does not appear to be any
way to include these characters explicitly in the rewritten
string.  Since '%' is used for encoding, that is a bit of a
misfeature, IMHO.

I am looking at a fix; some way to signal 'insert rather than
interpret' these characters.

Just FYI..
-- 
#ken    P-)}

Ken Coar                    <http://Golux.Com/coar/>
Apache Software Foundation  <http://www.apache.org/>
"Apache Server for Dummies" <http://Apache-Server.Com/>
"Apache Server Unleashed"   <http://ApacheUnleashed.Com/>

ApacheCon 2001!
Four tracks with over 70+ sessions. Free admission to exhibits
and special events - keynote presentations by John 'maddog' Hall
and David Brin. Special thanks to our Platinum Sponsors IBM and
Covalent, Gold Sponsor Thawte, and Silver Sponsor Compaq.  Attend
the only Apache event designed and fully supported by the members of
the ASF. See more information and register at <http://ApacheCon.Com/>!

Re: Deficiency in mod_rewrite

Posted by Andy Gardner <an...@navigator.co.nz>.
>Rodent of Unusual Size wrote:
>>
>> I just realised that mod_rewrite has a fairly significant
>> deficiency: since it uses '%' and '$' for its own purposes
>> in signalling substitution, there does not appear to be any
>> way to include these characters explicitly in the rewritten
>> string.
>

Just while we're on the mod_rewrite subject, could a directive be added to
control what way inheritance is handled?

When using

RewriteOptions inherit

within a <virtualhost> the rules in the virtualhost appear to be run first,
THEN the rules inherited from the parent.

So if you have a catch-all rule at the end of the virtualhost container,
any rewrite rules in the parent are ignored.

My http.conf file would be a LOT smaller if I could ask mod_rewrite to
parse the inherited rules BEFORE the ones in the virtualhost.

something like

RewriteOptions inherit parentfirst

or suchlike.

I tried using an inherited rule allowing a single robot.txt file to be used
throughout an entire 1000+ virtual host setup, but it didn't work for any
hosts that had a

RewriteRule /.* blah

within them.



-- 
Andrew P. Gardner
barcelona.com stolen, stmoritz.com stays. What's uniform about the UDRP?
We could ask ICANN to send WIPO a clue, but do they have any to spare?
Get active: http://www.domain-owners.org http://www.tldlobby.com

Re: Deficiency in mod_rewrite

Posted by Rodent of Unusual Size <Ke...@Golux.Com>.
"William A. Rowe, Jr." wrote:
> 
> wouldn't \% and \$ be reasonably appropriate?

How we represent quoted characters on the input side is
a doddle; the problem is keeping '%' intact through
ap_os_escape_path().
-- 
#ken    P-)}

Ken Coar                    <http://Golux.Com/coar/>
Apache Software Foundation  <http://www.apache.org/>
"Apache Server for Dummies" <http://Apache-Server.Com/>
"Apache Server Unleashed"   <http://ApacheUnleashed.Com/>

ApacheCon 2001!
Four tracks with over 70+ sessions. Free admission to exhibits
and special events - keynote presentations by John 'maddog' Hall
and David Brin. Special thanks to our Platinum Sponsors IBM and
Covalent, Gold Sponsor Thawte, and Silver Sponsor Compaq.  Attend
the only Apache event designed and fully supported by the members of
the ASF. See more information and register at <http://ApacheCon.Com/>!

Re: Deficiency in mod_rewrite

Posted by "William A. Rowe, Jr." <wr...@covalent.net>.
From: "Rodent of Unusual Size" <Ke...@Golux.Com>
Sent: Friday, March 23, 2001 3:18 PM

> > I just realised that mod_rewrite has a fairly significant
> > deficiency: since it uses '%' and '$' for its own purposes
> > in signalling substitution, there does not appear to be any
> > way to include these characters explicitly in the rewritten
> > string.
> 
> The obvious first pass solution is to introduce the special
> strings '%%' and '%$' (and '$%' and '$$') as forcing the
> second character to be a literal.  No problem.. except that
> the result of the rewrite is always passed through ap_escape_uri()
> (aka ap_os_escape_path()), which leaves the '$' alone but
> turns the '%' into '%25'.

wouldn't \% and \$ be reasonably appropriate?



Re: [PATCH] to fix deficiency in mod_rewrite

Posted by Joshua Slive <jo...@slive.ca>.
On Fri, 27 Apr 2001, Rodent of Unusual Size wrote:

> Okey, here is a patch that seems to address this issue.  It
> does two things:
>
> 1. Adds the slosh as an escape character; it can be used to turn
>    '$1' from 'first backreference' into simply 'dollar-sign 1' with
>    '\$1'.  Any character can be escaped, but it really only has
>    significance for those with special meanings to mod_rewrite
>    (namely '$' and '%').
>
> 2. Adds the 'noescape|NE' flag to prevent the application of
>    ap_escape_uri() to transformed strings.

I did a few basic tests with this patch and it works perfectly for
me.  I like it.

Joshua.


[PATCH] to fix deficiency in mod_rewrite

Posted by Rodent of Unusual Size <Ke...@Golux.Com>.
Okey, here is a patch that seems to address this issue.  It
does two things:

1. Adds the slosh as an escape character; it can be used to turn
   '$1' from 'first backreference' into simply 'dollar-sign 1' with
   '\$1'.  Any character can be escaped, but it really only has
   significance for those with special meanings to mod_rewrite
   (namely '$' and '%').

2. Adds the 'noescape|NE' flag to prevent the application of
   ap_escape_uri() to transformed strings.

With the two used together, result strings can be coded to
include URI escape sequences such as '%24%24'.  Without this
change, that cannot be done; mod_rewrite will *always* treat
'%' as a variable marker, '$' as a backreference marker, and
escape '%' into '%25'.  So '%24%24' will currently *always*
be turned into '%2524%2524'.

mod_rewrite is such a beast that I am proposing this rather than
just committing it; I easily may have missed something.  If/when
it gets committed I will bring it forward to 2.0.

Index: src/CHANGES
===================================================================
RCS file: /home/cvs/apache-1.3/src/CHANGES,v
retrieving revision 1.1675
diff -u -r1.1675 CHANGES
--- src/CHANGES 2001/04/12 17:54:58     1.1675
+++ src/CHANGES 2001/04/27 20:35:33
@@ -1,5 +1,13 @@
 Changes with Apache 1.3.20
 
+  *) Added NOESCAPE (NS) flag to RewriteRule and enabled use of
+     '\' to allow escaping of special characters.  Previously
+     there was no way to embed either '$' or '%' in the output
+     of a RewriteRule; now 'foo\$1' will result in a literal
+     'foo$1' appearing in the result rather than 'foo\<value of $1>'.
+     Note that [NS] disables *all* normal URI escaping, so incautious
+     use can give unexpected results.  [Ken Coar]
+
   *) Changed the initial screen handling for NetWare so that the -s 
      parameter will properly destroy the Apache console screen and switch 
      to the system console screen.  Also removed the call to clrscr() for 
@@ -10,8 +18,8 @@
      for Win32 systems, see http://www.cygwin.com)
      [Stipe Tolj <to...@wapme-systems.de>]
 
-  *) Changes to 'ab'; fixed int overrun's, added statistics, output in
-     csv/gnuplot format, rudimentary ssl support and various other tweaks
+  *) Changes to 'ab': fixed int overruns, added statistics, output in
+     csv/gnuplot format, rudimentary SSL support and various other tweaks
      to make results more true to what is measured. The upshot of this it
      turns out that 'ab' has often underreported the true performance of
      apache. Often by a order of magnitude :-) See talk/paper of Sander 
Index: src/modules/standard/mod_rewrite.c
===================================================================
RCS file: /home/cvs/apache-1.3/src/modules/standard/mod_rewrite.c,v
retrieving revision 1.170
diff -u -r1.170 mod_rewrite.c
--- src/modules/standard/mod_rewrite.c  2001/02/01 04:12:26     1.170
+++ src/modules/standard/mod_rewrite.c  2001/04/27 20:35:34
@@ -866,6 +866,10 @@
             cfg->forced_responsecode = status;
         }
     }
+    else if (   strcasecmp(key, "noescape") == 0
+        || strcasecmp(key, "NE") == 0       ) {
+        cfg->flags |= RULEFLAG_NOESCAPE;
+    }
     else if (   strcasecmp(key, "last") == 0
              || strcasecmp(key, "L") == 0   ) {
         cfg->flags |= RULEFLAG_LASTRULE;
@@ -1010,6 +1014,7 @@
     const char *ccp;
     struct stat finfo;
     unsigned int port;
+    int rulestatus;
     int n;
     int l;
 
@@ -1093,7 +1098,8 @@
     /*
      *  now apply the rules ...
      */
-    if (apply_rewrite_list(r, conf->rewriterules, NULL)) {
+    rulestatus = apply_rewrite_list(r, conf->rewriterules, NULL);
+    if (rulestatus) {
 
         if (strlen(r->filename) > 6 &&
             strncmp(r->filename, "proxy:", 6) == 0) {
@@ -1143,16 +1149,28 @@
             for ( ; *cp != '/' && *cp != '\0'; cp++)
                 ;
             if (*cp != '\0') {
-                rewritelog(r, 1, "escaping %s for redirect", r->filename);
-                cp2 = ap_escape_uri(r->pool, cp);
+                if (rulestatus != ACTION_NOESCAPE) {
+                    rewritelog(r, 1, "escaping %s for redirect", r->filename);
+                    cp2 = ap_escape_uri(r->pool, cp);
+                }
+                else {
+                    cp2 = ap_pstrdup(r->pool, cp);
+                }
                 *cp = '\0';
                 r->filename = ap_pstrcat(r->pool, r->filename, cp2, NULL);
             }
 
             /* append the QUERY_STRING part */
             if (r->args != NULL) {
+                char *args;
+                if (rulestatus == ACTION_NOESCAPE) {
+                    args = r->args;
+                }
+                else {
+                    args = ap_escape_uri(r->pool, r->args);
+                }
                 r->filename = ap_pstrcat(r->pool, r->filename, "?", 
-                                         ap_escape_uri(r->pool, r->args), NULL);
+                                         args, NULL);
             }
 
             /* determine HTTP redirect response code */
@@ -1305,6 +1323,7 @@
     const char *ccp;
     char *prefix;
     int l;
+    int rulestatus;
     int n;
     char *ofilename;
 
@@ -1358,7 +1377,8 @@
     /*
      *  now apply the rules ...
      */
-    if (apply_rewrite_list(r, dconf->rewriterules, dconf->directory)) {
+    rulestatus = apply_rewrite_list(r, dconf->rewriterules, dconf->directory);
+    if (rulestatus) {
 
         if (strlen(r->filename) > 6 &&
             strncmp(r->filename, "proxy:", 6) == 0) {
@@ -1422,17 +1442,29 @@
             for ( ; *cp != '/' && *cp != '\0'; cp++)
                 ;
             if (*cp != '\0') {
-                rewritelog(r, 1, "[per-dir %s] escaping %s for redirect",
-                           dconf->directory, r->filename);
-                cp2 = ap_escape_uri(r->pool, cp);
+                if (rulestatus != ACTION_NOESCAPE) {
+                    rewritelog(r, 1, "[per-dir %s] escaping %s for redirect",
+                               dconf->directory, r->filename);
+                    cp2 = ap_escape_uri(r->pool, cp);
+                }
+                else {
+                    cp2 = ap_pstrdup(r->pool, cp);
+                }
                 *cp = '\0';
                 r->filename = ap_pstrcat(r->pool, r->filename, cp2, NULL);
             }
 
             /* append the QUERY_STRING part */
             if (r->args != NULL) {
+                char *args;
+                if (rulestatus == ACTION_NOESCAPE) {
+                    args = r->args;
+                }
+                else {
+                    args = ap_escape_uri(r->pool, r->args);
+                }
                 r->filename = ap_pstrcat(r->pool, r->filename, "?", 
-                                         ap_escape_uri(r->pool, r->args), NULL);
+                                         args, NULL);
             }
 
             /* determine HTTP redirect response code */
@@ -1622,7 +1654,8 @@
              *  Indicate a change if this was not a match-only rule.
              */
             if (rc != 2) {
-                changed = 1;
+                changed = ((p->flags & RULEFLAG_NOESCAPE)
+                           ? ACTION_NOESCAPE : ACTION_NORMAL);
             }
 
             /*
@@ -1636,7 +1669,7 @@
                            "to next API URI-to-filename handler", r->filename);
                 r->filename = ap_pstrcat(r->pool, "passthrough:",
                                          r->filename, NULL);
-                changed = 1;
+                changed = ACTION_NORMAL;
                 break;
             }
 
@@ -1648,7 +1681,7 @@
                 rewritelog(r, 2, "forcing '%s' to be forbidden", r->filename);
                 r->filename = ap_pstrcat(r->pool, "forbidden:",
                                          r->filename, NULL);
-                changed = 1;
+                changed = ACTION_NORMAL;
                 break;
             }
 
@@ -1659,7 +1692,7 @@
             if (p->flags & RULEFLAG_GONE) {
                 rewritelog(r, 2, "forcing '%s' to be gone", r->filename);
                 r->filename = ap_pstrcat(r->pool, "gone:", r->filename, NULL);
-                changed = 1;
+                changed = ACTION_NORMAL;
                 break;
             }
 
@@ -2245,7 +2278,7 @@
     space = nbuf - 1; /* room for '\0' */
 
     for (;;) {
-       span = strcspn(inp, "$%");
+       span = strcspn(inp, "\\$%");
        if (span > space) {
            span = space;
        }
@@ -2256,8 +2289,14 @@
        if (space == 0 || *inp == '\0') {
            break;
        }
-       /* now we have a '$' or a '%' */
-       if (inp[1] == '{') {
+       /* now we have a '\', '$', or '%' */
+        if (inp[0] == '\\') {
+            if (inp[1] != '\0') {
+                inp++;
+                goto skip;
+            }
+        }
+       else if (inp[1] == '{') {
            char *endp;
            endp = find_closing_bracket(inp+2, '{', '}');
            if (endp == NULL) {
@@ -2288,14 +2327,16 @@
                char xkey[MAX_STRING_LEN];
                char xdflt[MAX_STRING_LEN];
                key = find_char_in_brackets(inp+2, ':', '{', '}');
-               if (key == NULL)
+               if (key == NULL) {
                    goto skip;
+                }
                map  = ap_pstrndup(r->pool, inp+2, key-inp-2);
                dflt = find_char_in_brackets(key+1, '|', '{', '}');
                if (dflt == NULL) {
                    key  = ap_pstrndup(r->pool, key+1, endp-key-1);
                    dflt = "";
-               } else {
+               }
+                else {
                    key  = ap_pstrndup(r->pool, key+1, dflt-key-1);
                    dflt = ap_pstrndup(r->pool, dflt+1, endp-dflt-1);
                }
Index: src/modules/standard/mod_rewrite.h
===================================================================
RCS file: /home/cvs/apache-1.3/src/modules/standard/mod_rewrite.h,v
retrieving revision 1.79
diff -u -r1.79 mod_rewrite.h
--- src/modules/standard/mod_rewrite.h  2001/03/21 03:09:45     1.79
+++ src/modules/standard/mod_rewrite.h  2001/04/27 20:35:34
@@ -209,6 +209,10 @@
 #define RULEFLAG_GONE               1<<10
 #define RULEFLAG_QSAPPEND           1<<11
 #define RULEFLAG_NOCASE             1<<12
+#define RULEFLAG_NOESCAPE           1<<13
+
+#define ACTION_NORMAL               1<<0
+#define ACTION_NOESCAPE             1<<1
 
 #define MAPTYPE_TXT                 1<<0
 #define MAPTYPE_DBM                 1<<1



-- 
#ken    P-)}

Ken Coar                    <http://Golux.Com/coar/>
Apache Software Foundation  <http://www.apache.org/>
"Apache Server for Dummies" <http://Apache-Server.Com/>
"Apache Server Unleashed"   <http://ApacheUnleashed.Com/>

Re: Deficiency in mod_rewrite

Posted by Joshua Slive <sl...@finance.commerce.ubc.ca>.
On Sat, 24 Mar 2001, Rodent of Unusual Size wrote:

> Joshua Slive wrote:
> >
> > How about an "option" like [NE] for "No-Escape" for the RewriteRule
> > directive.  Sounds a little more obvious than just turing off
> > the escaping for certain strings.
>
> I thought of that -- but then what happens when '%%' (or '\%')
> is present and escaping *has* to be disabled, but the user
> did not specify NE?

Perhaps I misunderstand the problem, but I would just assume that if the
user specifies %% and doesn't use NE then they want the %% escaped.  If,
on the other hand, they specify NE, then they don't need to use the
special %% or \% notation.  They can just use % and it will be passed
through untouched.

Joshua.


Re: Deficiency in mod_rewrite

Posted by Rodent of Unusual Size <Ke...@Golux.Com>.
Joshua Slive wrote:
> 
> How about an "option" like [NE] for "No-Escape" for the RewriteRule
> directive.  Sounds a little more obvious than just turing off
> the escaping for certain strings.

I thought of that -- but then what happens when '%%' (or '\%')
is present and escaping *has* to be disabled, but the user
did not specify NE?
-- 
#ken    P-)}

Ken Coar                    <http://Golux.Com/coar/>
Apache Software Foundation  <http://www.apache.org/>
"Apache Server for Dummies" <http://Apache-Server.Com/>
"Apache Server Unleashed"   <http://ApacheUnleashed.Com/>

ApacheCon 2001!
Four tracks with over 70+ sessions. Free admission to exhibits
and special events - keynote presentations by John 'maddog' Hall
and David Brin. Special thanks to our Platinum Sponsors IBM and
Covalent, Gold Sponsor Thawte, and Silver Sponsor Compaq.  Attend
the only Apache event designed and fully supported by the members of
the ASF. See more information and register at <http://ApacheCon.Com/>!

Re: Deficiency in mod_rewrite

Posted by Joshua Slive <jo...@slive.ca>.
On Fri, 23 Mar 2001, Rodent of Unusual Size wrote:
> I see three options here:
>
> 1. Leave it alone and document it as a deficiency ('you cannot have
>    escapes in rewritten URLs').
> 2. Jump through the necessary hoops to find an appropriate
>    replacement character that will make it through the call
>    to ap_escape_uri(), and then replace that with '%' after
>    escaping.
> 3. Disable escaping (and log the fact in the rewritelog) if the
>    rewrite string contains '%%' or '$%'.

How about an "option" like [NE] for "No-Escape" for the RewriteRule
directive.  Sounds a little more obvious than just turing off
the escaping for certain strings.

Joshua.


Re: Deficiency in mod_rewrite

Posted by Rodent of Unusual Size <Ke...@Golux.Com>.
Rodent of Unusual Size wrote:
> 
> I just realised that mod_rewrite has a fairly significant
> deficiency: since it uses '%' and '$' for its own purposes
> in signalling substitution, there does not appear to be any
> way to include these characters explicitly in the rewritten
> string.

The obvious first pass solution is to introduce the special
strings '%%' and '%$' (and '$%' and '$$') as forcing the
second character to be a literal.  No problem.. except that
the result of the rewrite is always passed through ap_escape_uri()
(aka ap_os_escape_path()), which leaves the '$' alone but
turns the '%' into '%25'.

The ways that I see out of this are either performance hogs
(such as replacing the literal '%' with a meta-string, escaping
the string, and then converting the meta-string back to '%')
or else potentially suspect (like turning off escaping if
'%%' was used).  The former sucks wind, and the latter would
let illegal characters through if they were in the rewritten
string.

Of course, mod_rewrite already has a means of reversing the
escaping.. if you use a rewrite map, which is certainly more
complicated than this needs to be.

I see three options here:

1. Leave it alone and document it as a deficiency ('you cannot have
   escapes in rewritten URLs').
2. Jump through the necessary hoops to find an appropriate
   replacement character that will make it through the call
   to ap_escape_uri(), and then replace that with '%' after
   escaping.
3. Disable escaping (and log the fact in the rewritelog) if the
   rewrite string contains '%%' or '$%'.

I think #1 is right out; I would like to fix this.  I personally
would like to do #2, but it is not as simple as #3, which is
definitely the quick&dirty solution.

Any other opinions or options I am missing?
-- 
#ken    P-)}

Ken Coar                    <http://Golux.Com/coar/>
Apache Software Foundation  <http://www.apache.org/>
"Apache Server for Dummies" <http://Apache-Server.Com/>
"Apache Server Unleashed"   <http://ApacheUnleashed.Com/>

ApacheCon 2001!
Four tracks with over 70+ sessions. Free admission to exhibits
and special events - keynote presentations by John 'maddog' Hall
and David Brin. Special thanks to our Platinum Sponsors IBM and
Covalent, Gold Sponsor Thawte, and Silver Sponsor Compaq.  Attend
the only Apache event designed and fully supported by the members of
the ASF. See more information and register at <http://ApacheCon.Com/>!