You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Lars Eilebrecht <la...@eilebrecht.net> on 2009/02/26 19:58:56 UTC
regex-related segfault in mod_include
Hi,
the following SSI statements triggers a segfault when QUERY_STRING
is empty (tested with 2.2.11):
<!--#if expr="$QUERY_STRING = /foobar=([0-9]+)$/" -->
<!--#set var="foobar" value="$1" -->
<!--#else -->
<!--#set var="foobar" value="$1" -->
<!--#endif -->
I tracked this down to get_include_var() in mod_include.c:
--snip--
static const char *get_include_var(const char *var, include_ctx_t *ctx)
{
const char *val;
request_rec *r = ctx->intern->r;
if (apr_isdigit(*var) && !var[1]) {
apr_size_t idx = *var - '0';
backref_t *re = ctx->intern->re;
/* Handle $0 .. $9 from the last regex evaluated.
* The choice of returning NULL strings on not-found,
* v.s. empty strings on an empty match is deliberate.
*/
if (!re) {
ap_log_rerror(APLOG_MARK, APLOG_WARNING, 0, r,
"regex capture $%" APR_SIZE_T_FMT " refers to no regex in %s",
idx, r->filename);
return NULL;
}
else {
if (re->nsub < idx || idx >= AP_MAX_REG_MATCH) {
ap_log_rerror(APLOG_MARK, APLOG_WARNING, 0, r,
"regex capture $%" APR_SIZE_T_FMT
" is out of range (last regex was: '%s') in
%s",
idx, re->rexp, r->filename);
return NULL;
}
if (re->match[idx].rm_so < 0 || re->match[idx].rm_eo < 0) {
return NULL;
}
val = apr_pstrmemdup(ctx->dpool, re->source + re->match[idx].rm_so,
re->match[idx].rm_eo - re->match[idx].rm_so);
}
}
else {
val = apr_table_get(r->subprocess_env, var);
if (val == LAZY_VALUE) {
val = add_include_vars_lazy(r, var);
}
}
return val;
}
--snip--
The segfault happens with apr_pstrmemdup(), because
"re->source + re->match[idx].rm_so" ends up being out of bounds.
So despite the regex not matching, "ctx->intern->re" is actually
not NULL, but I can't seem to figure out why this is the case.
Anyone any idea?
ciao...
--
Lars Eilebrecht
lars@eilebrecht.net
Re: regex-related segfault in mod_include
Posted by Lars Eilebrecht <la...@eilebrecht.net>.
"Plüm, Rüdiger, VF-Group" wrote:
> > However, re->match[idx].rm_so and re->match[idx].rm_eo are
> > random numbers,
> > i.e., a garbage value (I guess they should be 0 if there was
> > no match?).
>
> IMHO they should be -1.
Right, that actually makes more sense ...
> We use different PCRE versions in both (and maybe mod_include changed too).
> I suspect that if ap_regexec in re_check does not detect a match
> re->match[idx].rm_so is not setup correctly (maybe this changed between the
> different PCRE versions) and as we do not check in get_include_var if we had
> a match at all we fall over. So we should either memorize in the re struct
> if we matched or not by an additional flag, so something like (untested)
OK, nice ... I was trying to figure out if such a flag/value exists in
ap_regmatch_t, but that didn't got me very far as re->match is basically pointing
to garbage data. So initializing that actually prevents the segfault as it hits
the if statement for "re->match[idx].rm_so < 0" (I did a quick test with your
second patch).
However, for performance reasons I think fixing this with an additional flag
would be the best. I'll do some more testing and will come up with a final
patch for this.
Thanks Ruediger, that was very helpful. :)
cheers...
--
Lars Eilebrecht
lars@eilebrecht.net
AW: regex-related segfault in mod_include
Posted by "Plüm, Rüdiger, VF-Group" <ru...@vodafone.com>.
> -----Ursprüngliche Nachricht-----
> Von: Lars Eilebrecht [mailto:lars@eilebrecht.net]
> Gesendet: Freitag, 27. Februar 2009 12:54
> An: dev@httpd.apache.org
> Betreff: Re: regex-related segfault in mod_include
>
> Ruediger Pluem wrote:
>
> > What are the values of
> >
> > idx
> > re->match[idx].rm_so
> > re->match[idx].rm_eo
> > re->source
> >
> > and what is the string re->source is pointing to when the
> crash happens?
>
> idx is 1 and re->source points to an empty string which is fine.
> However, re->match[idx].rm_so and re->match[idx].rm_eo are
> random numbers,
> i.e., a garbage value (I guess they should be 0 if there was
> no match?).
IMHO they should be -1.
> Thus the argument "re->source + re->match[idx].rm_so" ends up
> pointing to
> an out of band location (and a memcpy() for that location results in
> the segfault).
>
> I just don't really get why this happens in some cases (like 1 out of
> 10 requests).
>
> BTW, I can reproduce this on Solaris and Linux (worker and prefork)
> with 2.2.11. With 2.0 this works fine.
We use different PCRE versions in both (and maybe mod_include changed too).
I suspect that if ap_regexec in re_check does not detect a match
re->match[idx].rm_so is not setup correctly (maybe this changed between the
different PCRE versions) and as we do not check in get_include_var if we had
a match at all we fall over. So we should either memorize in the re struct
if we matched or not by an additional flag, so something like (untested)
Index: mod_include.c
===================================================================
--- mod_include.c (revision 748051)
+++ mod_include.c (working copy)
@@ -158,6 +158,7 @@
const char *rexp;
apr_size_t nsub;
ap_regmatch_t match[AP_MAX_REG_MATCH];
+ int matched;
} backref_t;
typedef struct {
@@ -672,7 +673,8 @@
return NULL;
}
- if (re->match[idx].rm_so < 0 || re->match[idx].rm_eo < 0) {
+ if (re->match[idx].rm_so < 0 || re->match[idx].rm_eo < 0
+ || !re->matched) {
return NULL;
}
@@ -923,7 +925,6 @@
{
ap_regex_t *compiled;
backref_t *re = ctx->intern->re;
- int rc;
compiled = ap_pregcomp(ctx->dpool, rexp, AP_REG_EXTENDED);
if (!compiled) {
@@ -939,10 +940,10 @@
re->source = apr_pstrdup(ctx->pool, string);
re->rexp = apr_pstrdup(ctx->pool, rexp);
re->nsub = compiled->re_nsub;
- rc = !ap_regexec(compiled, string, AP_MAX_REG_MATCH, re->match, 0);
+ re->matched = !ap_regexec(compiled, string, AP_MAX_REG_MATCH, re->match, 0);
ap_pregfree(ctx->dpool, compiled);
- return rc;
+ return re->matched;
}
static int get_ptoken(include_ctx_t *ctx, const char **parse, token_t *token, token_t *previous)
or we should initialize the match array correctly:
Index: mod_include.c
===================================================================
--- mod_include.c (revision 748051)
+++ mod_include.c (working copy)
@@ -924,6 +924,7 @@
ap_regex_t *compiled;
backref_t *re = ctx->intern->re;
int rc;
+ int i;
compiled = ap_pregcomp(ctx->dpool, rexp, AP_REG_EXTENDED);
if (!compiled) {
@@ -939,6 +940,10 @@
re->source = apr_pstrdup(ctx->pool, string);
re->rexp = apr_pstrdup(ctx->pool, rexp);
re->nsub = compiled->re_nsub;
+ for (i = 0; i < AP_MAX_REG_MATCH; i++) {
+ re->match[i].rm_so = -1;
+ re->match[i].rm_eo = -1;
+ }
rc = !ap_regexec(compiled, string, AP_MAX_REG_MATCH, re->match, 0);
ap_pregfree(ctx->dpool, compiled);
Regards
Rüdiger
Re: regex-related segfault in mod_include
Posted by Lars Eilebrecht <la...@eilebrecht.net>.
Ruediger Pluem wrote:
> What are the values of
>
> idx
> re->match[idx].rm_so
> re->match[idx].rm_eo
> re->source
>
> and what is the string re->source is pointing to when the crash happens?
idx is 1 and re->source points to an empty string which is fine.
However, re->match[idx].rm_so and re->match[idx].rm_eo are random numbers,
i.e., a garbage value (I guess they should be 0 if there was no match?).
Thus the argument "re->source + re->match[idx].rm_so" ends up pointing to
an out of band location (and a memcpy() for that location results in
the segfault).
I just don't really get why this happens in some cases (like 1 out of
10 requests).
BTW, I can reproduce this on Solaris and Linux (worker and prefork)
with 2.2.11. With 2.0 this works fine.
ciao...
--
Lars Eilebrecht
lars@eilebrecht.net
Re: regex-related segfault in mod_include
Posted by Ruediger Pluem <rp...@apache.org>.
On 02/26/2009 07:58 PM, Lars Eilebrecht wrote:
> Hi,
>
> the following SSI statements triggers a segfault when QUERY_STRING
> is empty (tested with 2.2.11):
>
> <!--#if expr="$QUERY_STRING = /foobar=([0-9]+)$/" -->
> <!--#set var="foobar" value="$1" -->
> <!--#else -->
> <!--#set var="foobar" value="$1" -->
> <!--#endif -->
>
> I tracked this down to get_include_var() in mod_include.c:
>
> --snip--
> static const char *get_include_var(const char *var, include_ctx_t *ctx)
> {
> const char *val;
> request_rec *r = ctx->intern->r;
>
> if (apr_isdigit(*var) && !var[1]) {
> apr_size_t idx = *var - '0';
> backref_t *re = ctx->intern->re;
>
> /* Handle $0 .. $9 from the last regex evaluated.
> * The choice of returning NULL strings on not-found,
> * v.s. empty strings on an empty match is deliberate.
> */
> if (!re) {
> ap_log_rerror(APLOG_MARK, APLOG_WARNING, 0, r,
> "regex capture $%" APR_SIZE_T_FMT " refers to no regex in %s",
> idx, r->filename);
> return NULL;
> }
> else {
> if (re->nsub < idx || idx >= AP_MAX_REG_MATCH) {
> ap_log_rerror(APLOG_MARK, APLOG_WARNING, 0, r,
> "regex capture $%" APR_SIZE_T_FMT
> " is out of range (last regex was: '%s') in
> %s",
> idx, re->rexp, r->filename);
> return NULL;
> }
>
> if (re->match[idx].rm_so < 0 || re->match[idx].rm_eo < 0) {
> return NULL;
> }
>
> val = apr_pstrmemdup(ctx->dpool, re->source + re->match[idx].rm_so,
> re->match[idx].rm_eo - re->match[idx].rm_so);
> }
> }
> else {
> val = apr_table_get(r->subprocess_env, var);
>
> if (val == LAZY_VALUE) {
> val = add_include_vars_lazy(r, var);
> }
> }
>
> return val;
> }
> --snip--
>
> The segfault happens with apr_pstrmemdup(), because
> "re->source + re->match[idx].rm_so" ends up being out of bounds.
>
> So despite the regex not matching, "ctx->intern->re" is actually
> not NULL, but I can't seem to figure out why this is the case.
What are the values of
idx
re->match[idx].rm_so
re->match[idx].rm_eo
re->source
and what is the string re->source is pointing to when the crash happens?
Regards
Rüdiger