You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@httpd.apache.org by Olivier Poitrey <ol...@pas-tres.net> on 2004/01/21 15:17:53 UTC

[users@httpd] Apache 1.3 mod_rewrite and regex backreferences

Hello dudes,

I want to use back-references  in a RewriteRule regex part (think real
regex back-reference, not RewriteCond  back-references like %1) but it
doesn't  work.   I  thought  that   it  was  an  apache  regex  engine
limitation, but by reading the source  code of the regex engine, I saw
that it should be supported. So  my question is: is it normal that the
following RewriteRule doesn't match the "/foo/foo" URI:

RewriteRule ^/(foo)/\1 /something/... 
                    ^^
Is it an Apache regex engine bug ?

-- 
______________________________________________________________________
 O  l  i  v  i  e  r    P  o  i  t  r  e  y



---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] Apache 1.3 mod_rewrite and regex backreferences

Posted by Brian Dessent <br...@dessent.net>.
Olivier Poitrey wrote:

> > What  you're referring  to is  only valid  in the  "Perl compatible"
> > regular  expression flavor.   This is  not the  same as  egrep regex
> > flavor.  Apache 1.3 uses egrep,  2.0 uses pcre, if I'm not mistaken.
> > So you should be able to do this in 2.0 but not 1.3.
> 
> I can read the following in the POSIX regex documentation:
> 
> "[...] Finally, there  is one new type of atom,  a back reference: `\'
> followed by  a non-zero decimal digit  d matches the  same sequence of
> characters matched  by the dth  parenthesized subexpression (numbering
> subexpressions by the positions  of their opening parentheses, left to
> right),  so that  (e.g.)  `\([bc]\)\1'  matches `bb'  or `cc'  but not
> `bc'."
> 
> And, if you test my regex with egrep, it works perfectly:
> 
> $ echo "/foo/foo"|egrep '^/(foo)/\1'
> /foo/foo
> 
> Finaly, I found in the  apache regex implementation source code (witch
> isn't the  same code as the  egrep regex engine as  you said), special
> things to handle  this kind of backreferences. In  can't bother why it
> doesn't work anyway.

Well, all I can say is that the POSIX specs for extended regexps seem to
be ambiguous.  The FreeBSD manpage (which is what I was looking at)
specifies that egrep doesn't support backreferences (but old-style basic
regexps do.)  If you consult
<http://httpd.apache.org/docs/mod/mod_rewrite.html#RewriteRule> it says:

Text:
  .           Any single character
  [chars]     Character class: One  of chars
  [^chars]    Character class: None of chars
  text1|text2 Alternative: text1 or text2

Quantifiers:
  ?           0 or 1 of the preceding text
  *           0 or N of the preceding text (N > 0)
  +           1 or N of the preceding text (N > 1)

Grouping:
  (text)      Grouping of text
              (either to set the borders of an alternative or
              for making backreferences where the Nth group can 
              be used on the RHS of a RewriteRule with $N)

Anchors:
  ^           Start of line anchor
  $           End   of line anchor

Escaping:
  \char       escape that particular char
              (for instance to specify the chars ".[]()" etc.)

...that last part of which implies that \1 means the literal 1.  So, I
don't really know what to make of it.  Maybe one of the Apache
developers can give you a more specific response.  It does seem odd that
backreferences aren't working, though.

Brian

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] Apache 1.3 mod_rewrite and regex backreferences

Posted by Olivier Poitrey <ol...@pas-tres.net>.
Brian Dessent <br...@dessent.net> writes:

> It's not  a bug.  Using backreferences  in the "search"  part of the
> regular expression is not  valid POSIX (extended) regular expression
> syntax.  Check  re_format(7).  A backslash  followed by a  number or
> letter is to match that number or letter, as if the backslash wasn't
> there.  You can  use backreferences only in the  "replace" part, not
> the "search" part. 
>
> What  you're referring  to is  only valid  in the  "Perl compatible"
> regular  expression flavor.   This is  not the  same as  egrep regex
> flavor.  Apache 1.3 uses egrep,  2.0 uses pcre, if I'm not mistaken.
> So you should be able to do this in 2.0 but not 1.3.

I can read the following in the POSIX regex documentation:

"[...] Finally, there  is one new type of atom,  a back reference: `\'
followed by  a non-zero decimal digit  d matches the  same sequence of
characters matched  by the dth  parenthesized subexpression (numbering
subexpressions by the positions  of their opening parentheses, left to
right),  so that  (e.g.)  `\([bc]\)\1'  matches `bb'  or `cc'  but not
`bc'." 

And, if you test my regex with egrep, it works perfectly:

$ echo "/foo/foo"|egrep '^/(foo)/\1'
/foo/foo

Finaly, I found in the  apache regex implementation source code (witch
isn't the  same code as the  egrep regex engine as  you said), special
things to handle  this kind of backreferences. In  can't bother why it
doesn't work anyway. 

Best regards,

-- 
______________________________________________________________________
 O  l  i  v  i  e  r    P  o  i  t  r  e  y

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] Apache 1.3 mod_rewrite and regex backreferences

Posted by Brian Dessent <br...@dessent.net>.
Olivier Poitrey wrote:

> I want to use back-references  in a RewriteRule regex part (think real
> regex back-reference, not RewriteCond  back-references like %1) but it
> doesn't  work.   I  thought  that   it  was  an  apache  regex  engine
> limitation, but by reading the source  code of the regex engine, I saw
> that it should be supported. So  my question is: is it normal that the
> following RewriteRule doesn't match the "/foo/foo" URI:
> 
> RewriteRule ^/(foo)/\1 /something/...

It's not a bug.  Using backreferences in the "search" part of the
regular expression is not valid POSIX (extended) regular expression
syntax.  Check re_format(7).  A backslash followed by a number or letter
is to match that number or letter, as if the backslash wasn't there. 
You can use backreferences only in the "replace" part, not the "search"
part.

What you're referring to is only valid in the "Perl compatible" regular
expression flavor.  This is not the same as egrep regex flavor.  Apache
1.3 uses egrep, 2.0 uses pcre, if I'm not mistaken.  So you should be
able to do this in 2.0 but not 1.3.

Brian

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org