You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by "Ralf S. Engelschall" <rs...@engelschall.com> on 1997/07/29 17:51:05 UTC

[PATCH] more backref interpolation for rewriting engine

While the recent patches to mod_rewrite were pure bugfixes, here is an
essential enhancement patch which I really want to see in 1.3. I'm myself
wondering why this functionality was missing for such a long time, although is
was not difficult to implement and a lot of users wanted it in the past....

Description:
------------

To make it clear, let me explain how a typical ruleset gets evaluated by the
rewriting engine:

   RewriteCond <cond-string-1> <cond-pattern-1>
   RewriteCond <cond-string-2> <cond-pattern-2>
      :
   RewriteCond <cond-string-N> <cond-pattern-N>
   RewriteRule <url-pattern>   <url-string>

The evaluation is this:
   1. Match current URL against <url-pattern>.
      if not match -> ignore rule and all conditions
   2. Create <cond-string-1> and match against <cond-pattern-1>
      if not match -> ignore rule and all conditions
   3. Create <cond-string-2> and match against <cond-pattern-2>
      if not match -> ignore rule and all conditions
        :
 n+1. Create <cond-string-N> and match against <cond-pattern-N>
      if not match -> ignore rule and all conditions
 n+2. Create <url-string> and replace current URL with this one.
 n+3. Proceed with following rules.

The essential part here is that one usually uses regex grouping via
parenthesis in <url-pattern> to match some parts of the URL and then
backreferences $1...$n in <url-string> to use them, i.e. to construct
<url-string> one can use the result of the already matched pattern.

** The restriction: When constructing <cond-string-XX> we cannot use these
backreferences. Neither the one from <url-pattern> nor from and preceeding
<cond-pattern-XX>. And we also cannot use backreferences in <url-string> to
access the results from any successful <cond-pattern-XX>. Two big restrictions
although the information is still there. Just no way to access it in the past.

Now the new functionality: 
   1. One can use $1..$n in any <cond-string-XX> to backreference the result
      of <url-pattern>.
   2. One can use %1..%n in any <cond-string-XX> _AND_ in
      <url-string> to backreference the result from the last matching
      <cond-pattern-XX>. (initially this is empty for <cond-string-1>).

What is the benefit? A lot of rewriting rules can be shortend and a lot of
other are now the first time possible!!

Example: 
-------

Assume we want to generate (e.g. via WML) alternative variants of HTML pages.
We do this by inserting a style tag before the extension, e.g.  index.html is
the HTML page with the standard style while index.min.html is the one
generated for Lynx users. This approach is done for a complete hierarchy of
webpages. 

Now we want to be able to provide these variants to the user in a nice way.
The nicest way is just to be able to insert the style tag at any point in the
URLs of this webarea to switch the variant. For instance /our/area/ should
give the index.html while /our/min/area/ or /min/our/area/ should give the
index.min.html. Same for style variants like "min", "std" and "max", etc. BUT:
When there is no xx.<STYLE>.html then xx.html should be used.  This way we are
backward-compatible and need not to provide a complete clone of the webarea
for each style. 

The trick is that navigation remains homogeneous inside a style variant when
we use only relative links, because the style is always passed on via the URL.

Without the new functionality this is completely impossible!  With this
functionality it is straight-forward:

    #   step 1: always remove the indicator part, but remember it
    RewriteRule   ^(.*)/(min|std|max)/(.*)$  $1/$3  [E=STYLE:$2]
    
    #   step 2: on initial requests check some URLs if they
    #           are valid when we insert the style indicator and
    #           rewrite to this variant when the URLs exists
    RewriteCond   %{ENV:STYLE}              !=""
    RewriteCond   $1.%{ENV:STYLE}.$2        -U
    RewriteRule   ^(.+)\.([sp]?html|s?cgi)$ $1.%{ENV:STYLE}.$2 [NS]

Side effect:
------------

To be able to implement it cleanly I've changed the code of the rewriting
engine to use a common new function expand_backref_inbuffer(). This way the
code shrunks and is really more clear then ever before. Ok, this cannot be
seen directly from the diff snippet below, but have a look at
apply_rewrite_rule() before and after the patch. 

                                       Ralf S. Engelschall
                                       rse@engelschall.com
                                       www.engelschall.com

--- mod_rewrite.h	Tue Jul 29 14:08:53 1997
+++ mod_rewrite.h.NEW	Tue Jul 29 14:08:58 1997
@@ -205,7 +205,8 @@
 
 #define MAX_ENV_FLAGS 5
 
-#define EOS_PARANOIA(ca) ca[sizeof(ca)-1] = '\0'
+#define EOS_PARANOIA(ca)            ca[sizeof(ca)-1] = '\0'
+#define EOS_PARANOIA_SIZE(cp, size) cp[size-1] = '\0'
 
 
 /*
@@ -292,6 +293,15 @@
     array_header *lists;
 } cache;
 
+    /* the regex structure for the
+       substitution of backreferences */
+
+typedef struct backrefinfo {
+    char *source;
+    int nsub;
+    regmatch_t regmatch[10];
+} backrefinfo;
+
 
 /*
 **
@@ -335,11 +345,12 @@
     /* rewriting engine */
 static int apply_rewrite_list(request_rec *r, array_header *rewriterules, char *perdir);
 static int apply_rewrite_rule(request_rec *r, rewriterule_entry *p, char *perdir); 
-static int apply_rewrite_cond(request_rec *r, rewritecond_entry *p, char *perdir); 
+static int apply_rewrite_cond(request_rec *r, rewritecond_entry *p, char *perdir, backrefinfo *briRR, backrefinfo *briRC); 
 
     /* URI transformation function */
 static void  splitout_queryargs(request_rec *r, int qsappend);
 static void  reduce_uri(request_rec *r);
+static void  expand_backref_inbuffer(pool *p, char *buf, int nbuf, backrefinfo *bri, char c);
 static char *expand_tildepaths(request_rec *r, char *uri);
 static void  expand_map_lookups(request_rec *r, char *uri, int uri_len);
 
--- mod_rewrite.c	Tue Jul 29 17:15:40 1997
+++ mod_rewrite.c.NEW	Tue Jul 29 17:47:37 1997
@@ -1442,16 +1442,17 @@
     char newuri[MAX_STRING_LEN];
     char env[MAX_STRING_LEN];
     char port[32];
-    char env2[MAX_STRING_LEN];
     regex_t *regexp;
     regmatch_t regmatch[10];
-    int rc;
+    backrefinfo *briRR = NULL;
+    backrefinfo *briRC = NULL;
     int prefixstrip;
-    int i;
     int failed;
     array_header *rewriteconds;
     rewritecond_entry *conds;
     rewritecond_entry *c;
+    int i;
+    int rc;
 
     uri     = r->filename;
     regexp  = p->regexp;
@@ -1481,6 +1482,24 @@
     if (( rc && !(p->flags & RULEFLAG_NOTMATCH)) ||
         (!rc &&  (p->flags & RULEFLAG_NOTMATCH))   ) {     
 
+        /* create the RewriteRule regsubinfo */
+        briRR = (backrefinfo *)palloc(r->pool, sizeof(backrefinfo));
+        if (!rc && (p->flags & RULEFLAG_NOTMATCH)) {
+            briRR->source = "";
+            briRR->nsub   = 0;
+        }
+        else {
+            briRR->source = pstrdup(r->pool, uri);
+            briRR->nsub   = regexp->re_nsub;
+            memcpy((void *)(briRR->regmatch), (void *)(regmatch), sizeof(regmatch_t)*10);
+        }
+
+        /* create the RewriteCond backrefinfo, but
+           initialized as empty backrefinfo, i.e. not subst */
+        briRC = (backrefinfo *)pcalloc(r->pool, sizeof(backrefinfo));
+        briRC->source = "";
+        briRC->nsub   = 0;
+
         /* ok, the pattern matched, but we now additionally have to check 
            for any preconditions which have to be also true. We do this
            at this very late stage to avoid unnessesary checks which
@@ -1490,7 +1509,7 @@
         failed = 0;
         for (i = 0; i < rewriteconds->nelts; i++) {
             c = &conds[i];
-            rc = apply_rewrite_cond(r, c, perdir);
+            rc = apply_rewrite_cond(r, c, perdir, briRR, briRC);
             if (c->flags & CONDFLAG_ORNEXT) {
                 /* there is a "or" flag */
                 if (rc == 0) {
@@ -1523,10 +1542,10 @@
         if (strcmp(output, "-") == 0) {
             /* but before we set the env variables... */
             for (i = 0; p->env[i] != NULL; i++) {
-                strncpy(env2, p->env[i], sizeof(env2)-1);
-                EOS_PARANOIA(env2);
-                strncpy(env, pregsub(r->pool, env2, uri, regexp->re_nsub+1, regmatch), sizeof(env)-1);    /* substitute in output */
+                strncpy(env, p->env[i], sizeof(env)-1);
                 EOS_PARANOIA(env);
+                expand_backref_inbuffer(r->pool, env, sizeof(env), briRR, '$');
+                expand_backref_inbuffer(r->pool, env, sizeof(env), briRC, '%');
                 add_env_variable(r, env);
             }
             return 2;
@@ -1534,27 +1553,13 @@
 
         /* if this is a forced proxy request ... */
         if (p->flags & RULEFLAG_PROXY) {
-            if (p->flags & RULEFLAG_NOTMATCH) {
-                output = pstrcat(r->pool, "proxy:", output, NULL);
-                strncpy(newuri, output, sizeof(newuri)-1);
-                EOS_PARANOIA(newuri);
-                expand_variables_inbuffer(r, newuri, sizeof(newuri));/* expand %{...} */
-                expand_map_lookups(r, newuri, sizeof(newuri));       /* expand ${...} */
-            }
-            else {
-                output = pstrcat(r->pool, "proxy:", output, NULL);
-                strncpy(newuri, pregsub(r->pool, output, uri, regexp->re_nsub+1, regmatch), sizeof(newuri)-1);    /* substitute in output */
-                EOS_PARANOIA(newuri);
-                for (i = 0; p->env[i] != NULL; i++) {
-                    strncpy(env2, p->env[i], sizeof(env2)-1);
-                    EOS_PARANOIA(env2);
-                    strncpy(env, pregsub(r->pool, env2, uri, regexp->re_nsub+1, regmatch), sizeof(env)-1);    /* substitute in output */
-                    EOS_PARANOIA(env);
-                    add_env_variable(r, env);
-                }
-                expand_variables_inbuffer(r, newuri, sizeof(newuri));   /* expand %{...} */
-                expand_map_lookups(r, newuri, sizeof(newuri));          /* expand ${...} */
-            }
+            output = pstrcat(r->pool, "proxy:", output, NULL);
+            strncpy(newuri, output, sizeof(newuri)-1);
+            EOS_PARANOIA(newuri);
+            expand_backref_inbuffer(r->pool, newuri, sizeof(newuri), briRR, '$'); /* expand $N */
+            expand_backref_inbuffer(r->pool, newuri, sizeof(newuri), briRC, '%'); /* expand %N */
+            expand_variables_inbuffer(r, newuri, sizeof(newuri));                 /* expand %{...} */
+            expand_map_lookups(r, newuri, sizeof(newuri));                        /* expand ${...} */
             if (perdir == NULL)
                 rewritelog(r, 2, "rewrite %s -> %s", r->filename, newuri);
             else
@@ -1570,24 +1575,18 @@
                 || (i > 8 && strncmp(output, "https://", 8) == 0)
                 || (i > 9 && strncmp(output, "gopher://", 9) == 0)
                 || (i > 6 && strncmp(output, "ftp://", 6) == 0)   ) ) {
-            if (p->flags & RULEFLAG_NOTMATCH) {
-                strncpy(newuri, output, sizeof(newuri)-1);
-                EOS_PARANOIA(newuri);
-                expand_variables_inbuffer(r, newuri, sizeof(newuri));/* expand %{...} */
-                expand_map_lookups(r, newuri, sizeof(newuri));       /* expand ${...} */
-            }
-            else {
-                strncpy(newuri, pregsub(r->pool, output, uri, regexp->re_nsub+1, regmatch), sizeof(newuri)-1);    /* substitute in output */
-                EOS_PARANOIA(newuri);
-                for (i = 0; p->env[i] != NULL; i++) {
-                    strncpy(env2, p->env[i], sizeof(env2)-1);
-                    EOS_PARANOIA(env2);
-                    strncpy(env, pregsub(r->pool, env2, uri, regexp->re_nsub+1, regmatch), sizeof(env)-1);    /* substitute in output */
-                    EOS_PARANOIA(env);
-                    add_env_variable(r, env);
-                }
-                expand_variables_inbuffer(r, newuri, sizeof(newuri));/* expand %{...} */
-                expand_map_lookups(r, newuri, sizeof(newuri));       /* expand ${...} */
+            strncpy(newuri, output, sizeof(newuri)-1);
+            EOS_PARANOIA(newuri);
+            expand_backref_inbuffer(r->pool, newuri, sizeof(newuri), briRR, '$'); /* expand $N */
+            expand_backref_inbuffer(r->pool, newuri, sizeof(newuri), briRC, '%'); /* expand %N */
+            expand_variables_inbuffer(r, newuri, sizeof(newuri));                 /* expand %{...} */
+            expand_map_lookups(r, newuri, sizeof(newuri));                        /* expand ${...} */
+            for (i = 0; p->env[i] != NULL; i++) {
+                strncpy(env, p->env[i], sizeof(env)-1);
+                EOS_PARANOIA(env);
+                expand_backref_inbuffer(r->pool, env, sizeof(env), briRR, '$');
+                expand_backref_inbuffer(r->pool, env, sizeof(env), briRC, '%');
+                add_env_variable(r, env);
             }
             rewritelog(r, 2, "[per-dir %s] redirect %s -> %s", perdir, r->filename, newuri);
             r->filename = pstrdup(r->pool, newuri);
@@ -1603,25 +1602,20 @@
             output = pstrcat(r->pool, perdir, output, NULL);
         }
 
-        if (p->flags & RULEFLAG_NOTMATCH) {
-            /* just overtake the URI */
-            strncpy(newuri, output, sizeof(newuri)-1);
-            EOS_PARANOIA(newuri);
-        }
-        else {
-            /* substitute in output */
-            strncpy(newuri, pregsub(r->pool, output, uri, regexp->re_nsub+1, regmatch), sizeof(newuri)-1);    /* substitute in output */
-            EOS_PARANOIA(newuri);
-            for (i = 0; p->env[i] != NULL; i++) {
-                strncpy(env2, p->env[i], sizeof(env2)-1);
-                EOS_PARANOIA(env2);
-                strncpy(env, pregsub(r->pool, env2, uri, regexp->re_nsub+1, regmatch), sizeof(env)-1);    /* substitute in output */
-                EOS_PARANOIA(env);
-                add_env_variable(r, env);
-            }
+        /* standard case: create the substitution string */
+        strncpy(newuri, output, sizeof(newuri)-1);
+        EOS_PARANOIA(newuri);
+        expand_backref_inbuffer(r->pool, newuri, sizeof(newuri), briRR, '$'); /* expand $N */
+        expand_backref_inbuffer(r->pool, newuri, sizeof(newuri), briRC, '%'); /* expand %N */
+        expand_variables_inbuffer(r, newuri, sizeof(newuri));                 /* expand %{...} */
+        expand_map_lookups(r, newuri, sizeof(newuri));                        /* expand ${...} */
+        for (i = 0; p->env[i] != NULL; i++) {
+            strncpy(env, p->env[i], sizeof(env)-1);
+            EOS_PARANOIA(env);
+            expand_backref_inbuffer(r->pool, env, sizeof(env), briRR, '$');
+            expand_backref_inbuffer(r->pool, env, sizeof(env), briRC, '%');
+            add_env_variable(r, env);
         }
-        expand_variables_inbuffer(r, newuri, sizeof(newuri));  /* expand %{...} */
-        expand_map_lookups(r, newuri, sizeof(newuri));         /* expand ${...} */
 
         if (perdir == NULL)
             rewritelog(r, 2, "rewrite %s -> %s", uri, newuri);
@@ -1694,15 +1688,30 @@
     return 0;
 }
 
-static int apply_rewrite_cond(request_rec *r, rewritecond_entry *p, char *perdir)
+static int apply_rewrite_cond(request_rec *r, rewritecond_entry *p, char *perdir, backrefinfo *briRR, backrefinfo *briRC)
 {
-    char *input;
-    int rc;
+    char input[MAX_STRING_LEN];
     struct stat sb;
     request_rec *rsub;
+    regmatch_t regmatch[10];
+    int rc;
 
-    /* first, we have to expand the input string to match */
-    input = expand_variables(r, p->input);
+    /*
+     *   Construct the string we match against
+     */
+
+    /* expand the regex backreferences from the RewriteRule ($0-$9), 
+       then from the last RewriteCond (%0-%9) and then expand the 
+       variables (%{....}) */
+    strncpy(input, p->input, sizeof(input)-1);
+    EOS_PARANOIA(input);
+    expand_backref_inbuffer(r->pool, input, sizeof(input), briRR, '$');
+    expand_backref_inbuffer(r->pool, input, sizeof(input), briRC, '%');
+    expand_variables_inbuffer(r, input, sizeof(input));
+
+    /*
+     *   Apply the patterns
+     */
 
     rc = 0;
     if (strcmp(p->pattern, "-f") == 0) {
@@ -1784,11 +1793,22 @@
         rc = (compare_lexicography(input, p->pattern+1) == -1 ? 1 : 0);
     }
     else if (strlen(p->pattern) > 1 && *(p->pattern) == '=') {
-        rc = (strcmp(input, p->pattern+1) == 0 ? 1 : 0);
+        if (strcmp(p->pattern+1, "\"\"") == 0)
+            rc = (*input == '\0');
+        else
+            rc = (strcmp(input, p->pattern+1) == 0 ? 1 : 0);
     }
     else {
         /* it is really a regexp pattern, so apply it */
-        rc = (regexec(p->regexp, input, 0, NULL, 0) == 0);
+        rc = (regexec(p->regexp, input, p->regexp->re_nsub+1, regmatch, 0) == 0);
+
+        /* if it isn't a negated pattern and really matched
+           we update the passed-through regex subst info structure */
+        if (rc && !(p->flags & CONDFLAG_NOTMATCH)) { 
+            briRC->source = pstrdup(r->pool, input);
+            briRC->nsub   = p->regexp->re_nsub;
+            memcpy((void *)(briRC->regmatch), (void *)(regmatch), sizeof(regmatch_t)*10);
+        }
     }
 
     /* if this is a non-matching regexp, just negate the result */ 
@@ -1934,6 +1954,43 @@
         }
     }
     return;            
+}
+
+
+/*
+**
+**  Expand the %0-%9 or $0-$9 regex backreferences
+**
+*/
+
+static void expand_backref_inbuffer(pool *p, char *buf, int nbuf, backrefinfo *bri, char c)
+{
+    int i;
+
+    if (bri->nsub < 1)
+        return;
+
+    if (c != '$') {
+        /* safe existing $N backrefs and replace <c>N with $N backrefs */
+        for (i = 0; buf[i] != '\0' && i < nbuf; i++) {
+            if (buf[i] == '$' && (buf[i+1] >= '0' && buf[i+1] <= '9'))
+                buf[i++] = '\001';
+            else if (buf[i] == c && (buf[i+1] >= '0' && buf[i+1] <= '9'))
+                buf[i++] = '$';
+        }
+    }
+
+    /* now apply the pregsub() function */
+    strncpy(buf, pregsub(p, buf, bri->source, 
+                         bri->nsub+1, bri->regmatch), nbuf-1);
+    EOS_PARANOIA_SIZE(buf, nbuf);
+
+    if (c != '$') {
+        /* restore the original $N backrefs */
+        for (i = 0; buf[i] != '\0' && i < nbuf; i++)
+            if (buf[i] == '\001' && (buf[i+1] >= '0' && buf[i+1] <= '9'))
+                buf[i++] = '$';
+    }
 }
 

Re: [PATCH] more backref interpolation for rewriting engine

Posted by Dean Gaudet <dg...@arctic.org>.
+1, but only if you change the name of mod_rewrite to mod_sendmail_cf. 

just kidding. 

It is confusing for $N to back reference the url-pattern which appears
*AFTER* all the conditions.  But other than that, sure why not. 
mod_rewrite programs are about as confusing as perl programs ;) 

Dean

On Tue, 29 Jul 1997, Ralf S. Engelschall wrote:

> Now the new functionality: 
>    1. One can use $1..$n in any <cond-string-XX> to backreference the result
>       of <url-pattern>.
>    2. One can use %1..%n in any <cond-string-XX> _AND_ in
>       <url-string> to backreference the result from the last matching
>       <cond-pattern-XX>. (initially this is empty for <cond-string-1>).