You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Lars Eilebrecht <la...@eilebrecht.net> on 2009/02/26 19:58:56 UTC

regex-related segfault in mod_include

Hi,

the following SSI statements triggers a segfault when QUERY_STRING
is empty (tested with 2.2.11):

  <!--#if expr="$QUERY_STRING = /foobar=([0-9]+)$/" -->
    <!--#set var="foobar" value="$1" -->
   <!--#else -->
    <!--#set var="foobar" value="$1" -->	
  <!--#endif -->

I tracked this down to get_include_var() in mod_include.c:

--snip--
static const char *get_include_var(const char *var, include_ctx_t *ctx)
{
    const char *val;
    request_rec *r = ctx->intern->r;

    if (apr_isdigit(*var) && !var[1]) {
        apr_size_t idx = *var - '0';
        backref_t *re = ctx->intern->re;

        /* Handle $0 .. $9 from the last regex evaluated.
         * The choice of returning NULL strings on not-found,
         * v.s. empty strings on an empty match is deliberate.
         */
        if (!re) {
            ap_log_rerror(APLOG_MARK, APLOG_WARNING, 0, r,
                "regex capture $%" APR_SIZE_T_FMT " refers to no regex in %s",
                idx, r->filename);
            return NULL;
        }
        else {
            if (re->nsub < idx || idx >= AP_MAX_REG_MATCH) {
                ap_log_rerror(APLOG_MARK, APLOG_WARNING, 0, r,
                              "regex capture $%" APR_SIZE_T_FMT
                              " is out of range (last regex was: '%s') in
                              %s",
                              idx, re->rexp, r->filename);
                return NULL;
            }

            if (re->match[idx].rm_so < 0 || re->match[idx].rm_eo < 0) {
                return NULL;
            }

            val = apr_pstrmemdup(ctx->dpool, re->source + re->match[idx].rm_so,
                                 re->match[idx].rm_eo - re->match[idx].rm_so);
        }
    }
    else {
        val = apr_table_get(r->subprocess_env, var);

        if (val == LAZY_VALUE) {
            val = add_include_vars_lazy(r, var);
        }
    }

    return val;
}
--snip--

The segfault happens with apr_pstrmemdup(), because 
"re->source + re->match[idx].rm_so" ends up being out of bounds.

So despite the regex not matching, "ctx->intern->re" is actually
not NULL, but I can't seem to figure out why this is the case.

Anyone any idea?

ciao...
-- 
Lars Eilebrecht
lars@eilebrecht.net


Re: regex-related segfault in mod_include

Posted by Lars Eilebrecht <la...@eilebrecht.net>.
"Plüm, Rüdiger, VF-Group" wrote:

> > However, re->match[idx].rm_so and re->match[idx].rm_eo are 
> > random numbers,
> > i.e., a garbage value (I guess they should be 0 if there was 
> > no match?).
> 
> IMHO they should be -1. 

Right, that actually makes more sense ...

 
> We use different PCRE versions in both (and maybe mod_include changed too).
> I suspect that if ap_regexec in re_check does not detect a match 
> re->match[idx].rm_so is not setup correctly (maybe this changed between the
> different PCRE versions) and as we do not check in get_include_var if we had
> a match at all we fall over. So we should either memorize in the re struct
> if we matched or not by an additional flag, so something like (untested)

OK, nice ... I was trying to figure out if such a flag/value exists in
ap_regmatch_t, but that didn't got me very far as re->match is basically pointing
to garbage data. So initializing that actually prevents the segfault as it hits
the if statement for "re->match[idx].rm_so < 0" (I did a quick test with your
second patch).

However, for performance reasons I think fixing this with an additional flag
would be the best. I'll do some more testing and will come up with a final
patch for this.

Thanks Ruediger, that was very helpful. :)

cheers...
-- 
Lars Eilebrecht
lars@eilebrecht.net


AW: regex-related segfault in mod_include

Posted by "Plüm, Rüdiger, VF-Group" <ru...@vodafone.com>.
 

> -----Ursprüngliche Nachricht-----
> Von: Lars Eilebrecht [mailto:lars@eilebrecht.net] 
> Gesendet: Freitag, 27. Februar 2009 12:54
> An: dev@httpd.apache.org
> Betreff: Re: regex-related segfault in mod_include
> 
> Ruediger Pluem wrote:
> 
> > What are the values of
> > 
> > idx
> > re->match[idx].rm_so
> > re->match[idx].rm_eo
> > re->source
> > 
> > and what is the string re->source is pointing to when the 
> crash happens?
> 
> idx is 1 and re->source points to an empty string which is fine.
> However, re->match[idx].rm_so and re->match[idx].rm_eo are 
> random numbers,
> i.e., a garbage value (I guess they should be 0 if there was 
> no match?).

IMHO they should be -1. 

> Thus the argument "re->source + re->match[idx].rm_so" ends up 
> pointing to
> an out of band location (and a memcpy() for that location results in
> the segfault).
> 
> I just don't really get why this happens in some cases (like 1 out of
> 10 requests).
> 
> BTW, I can reproduce this on Solaris and Linux (worker and prefork)
> with 2.2.11. With 2.0 this works fine.

We use different PCRE versions in both (and maybe mod_include changed too).
I suspect that if ap_regexec in re_check does not detect a match 
re->match[idx].rm_so is not setup correctly (maybe this changed between the
different PCRE versions) and as we do not check in get_include_var if we had
a match at all we fall over. So we should either memorize in the re struct
if we matched or not by an additional flag, so something like (untested)

Index: mod_include.c
===================================================================
--- mod_include.c       (revision 748051)
+++ mod_include.c       (working copy)
@@ -158,6 +158,7 @@
     const char *rexp;
     apr_size_t  nsub;
     ap_regmatch_t match[AP_MAX_REG_MATCH];
+    int         matched;
 } backref_t;

 typedef struct {
@@ -672,7 +673,8 @@
                 return NULL;
             }

-            if (re->match[idx].rm_so < 0 || re->match[idx].rm_eo < 0) {
+            if (re->match[idx].rm_so < 0 || re->match[idx].rm_eo < 0
+                || !re->matched) {
                 return NULL;
             }

@@ -923,7 +925,6 @@
 {
     ap_regex_t *compiled;
     backref_t *re = ctx->intern->re;
-    int rc;

     compiled = ap_pregcomp(ctx->dpool, rexp, AP_REG_EXTENDED);
     if (!compiled) {
@@ -939,10 +940,10 @@
     re->source = apr_pstrdup(ctx->pool, string);
     re->rexp = apr_pstrdup(ctx->pool, rexp);
     re->nsub = compiled->re_nsub;
-    rc = !ap_regexec(compiled, string, AP_MAX_REG_MATCH, re->match, 0);
+    re->matched = !ap_regexec(compiled, string, AP_MAX_REG_MATCH, re->match, 0);

     ap_pregfree(ctx->dpool, compiled);
-    return rc;
+    return re->matched;
 }

 static int get_ptoken(include_ctx_t *ctx, const char **parse, token_t *token, token_t *previous)


or we should initialize the match array correctly:

Index: mod_include.c
===================================================================
--- mod_include.c       (revision 748051)
+++ mod_include.c       (working copy)
@@ -924,6 +924,7 @@
     ap_regex_t *compiled;
     backref_t *re = ctx->intern->re;
     int rc;
+    int i;

     compiled = ap_pregcomp(ctx->dpool, rexp, AP_REG_EXTENDED);
     if (!compiled) {
@@ -939,6 +940,10 @@
     re->source = apr_pstrdup(ctx->pool, string);
     re->rexp = apr_pstrdup(ctx->pool, rexp);
     re->nsub = compiled->re_nsub;
+    for (i = 0; i < AP_MAX_REG_MATCH; i++) {
+        re->match[i].rm_so = -1;
+        re->match[i].rm_eo = -1;
+    }
     rc = !ap_regexec(compiled, string, AP_MAX_REG_MATCH, re->match, 0);

     ap_pregfree(ctx->dpool, compiled);


Regards

Rüdiger


Re: regex-related segfault in mod_include

Posted by Lars Eilebrecht <la...@eilebrecht.net>.
Ruediger Pluem wrote:

> What are the values of
> 
> idx
> re->match[idx].rm_so
> re->match[idx].rm_eo
> re->source
> 
> and what is the string re->source is pointing to when the crash happens?

idx is 1 and re->source points to an empty string which is fine.
However, re->match[idx].rm_so and re->match[idx].rm_eo are random numbers,
i.e., a garbage value (I guess they should be 0 if there was no match?).
Thus the argument "re->source + re->match[idx].rm_so" ends up pointing to
an out of band location (and a memcpy() for that location results in
the segfault).

I just don't really get why this happens in some cases (like 1 out of
10 requests).

BTW, I can reproduce this on Solaris and Linux (worker and prefork)
with 2.2.11. With 2.0 this works fine.

ciao...
-- 
Lars Eilebrecht
lars@eilebrecht.net


Re: regex-related segfault in mod_include

Posted by Ruediger Pluem <rp...@apache.org>.

On 02/26/2009 07:58 PM, Lars Eilebrecht wrote:
> Hi,
> 
> the following SSI statements triggers a segfault when QUERY_STRING
> is empty (tested with 2.2.11):
> 
>   <!--#if expr="$QUERY_STRING = /foobar=([0-9]+)$/" -->
>     <!--#set var="foobar" value="$1" -->
>    <!--#else -->
>     <!--#set var="foobar" value="$1" -->	
>   <!--#endif -->
> 
> I tracked this down to get_include_var() in mod_include.c:
> 
> --snip--
> static const char *get_include_var(const char *var, include_ctx_t *ctx)
> {
>     const char *val;
>     request_rec *r = ctx->intern->r;
> 
>     if (apr_isdigit(*var) && !var[1]) {
>         apr_size_t idx = *var - '0';
>         backref_t *re = ctx->intern->re;
> 
>         /* Handle $0 .. $9 from the last regex evaluated.
>          * The choice of returning NULL strings on not-found,
>          * v.s. empty strings on an empty match is deliberate.
>          */
>         if (!re) {
>             ap_log_rerror(APLOG_MARK, APLOG_WARNING, 0, r,
>                 "regex capture $%" APR_SIZE_T_FMT " refers to no regex in %s",
>                 idx, r->filename);
>             return NULL;
>         }
>         else {
>             if (re->nsub < idx || idx >= AP_MAX_REG_MATCH) {
>                 ap_log_rerror(APLOG_MARK, APLOG_WARNING, 0, r,
>                               "regex capture $%" APR_SIZE_T_FMT
>                               " is out of range (last regex was: '%s') in
>                               %s",
>                               idx, re->rexp, r->filename);
>                 return NULL;
>             }
> 
>             if (re->match[idx].rm_so < 0 || re->match[idx].rm_eo < 0) {
>                 return NULL;
>             }
> 
>             val = apr_pstrmemdup(ctx->dpool, re->source + re->match[idx].rm_so,
>                                  re->match[idx].rm_eo - re->match[idx].rm_so);
>         }
>     }
>     else {
>         val = apr_table_get(r->subprocess_env, var);
> 
>         if (val == LAZY_VALUE) {
>             val = add_include_vars_lazy(r, var);
>         }
>     }
> 
>     return val;
> }
> --snip--
> 
> The segfault happens with apr_pstrmemdup(), because 
> "re->source + re->match[idx].rm_so" ends up being out of bounds.
> 
> So despite the regex not matching, "ctx->intern->re" is actually
> not NULL, but I can't seem to figure out why this is the case.

What are the values of

idx
re->match[idx].rm_so
re->match[idx].rm_eo
re->source

and what is the string re->source is pointing to when the crash happens?

Regards

Rüdiger