You are viewing a plain text version of this content. The canonical link for it is here.
Posted to regexp-user@jakarta.apache.org by Edwin Martin <ed...@bitstorm.nl> on 2001/06/08 01:23:51 UTC

Regexp broken

According to the Jakarta website, regexp "stood up quite well
to the test of time".

Is this true?

I made a JSP page with strange results.

Assume I have a string like "{regexp-1.2}" and I want to match
everything between the brackets (thus "regexp-1.2").

Let's try some regular expressions:

Input string "{regexp-1.2}" and RE "([a-z0-9]+)" match: "regexp"
Input string "{regexp-1.2}" and RE "([a-z0-9-]+)" match: "{regexp-1.2}"
Input string "{regexp-1.2}" and RE "([a-z0-9.]+)" match: "regexp"
Input string "{regexp-1.2}" and RE "([a-z0-9.-]+)" match: "{regexp"
Input string "{regexp-1.2}" and RE "([a-z0-9\-]+)" match: "regexp-1"
Input string "{regexp-1.2}" and RE "([a-z0-9\.]+)" match: "regexp"
Input string "{regexp-1.2}" and RE "([a-z0-9\.\-]+)" match: "regexp-1"
Input string "{regexp-1.2}" and RE "([a-z0-9.\-]+)" match: "regexp-1"
Input string "{regexp-1.2}" and RE "([a-z0-9\.-]+)" match: "{regexp"

No one of them gives "regexp-1.2"!

What should the regular expression be??

Of course we can use "[^{}]+", but that's not wat I want here,
the brackets can be any non-[a-z0-9.-].

Or is regexp broken and does nobody have the courage to admit it?

Edwin Martin.


BTW This is the JSP-code, so you can try it yourself:

<%@ page import="org.apache.regexp.*" %>

<%!
JspWriter JspOut;

public void reTest( String in, String re ) throws java.io.IOException, 
org.apache.regexp.RESyntaxException {
         JspOut.print( "Input string \""+in+"\" and RE \""+re+"\" match: ");
         RE testRe = new RE(re);
         if ( testRe.match( in ) )
                 JspOut.print( "\""+testRe.getParen(1)+"\"" );
         else
                 JspOut.print( "no match" );
         JspOut.print("<br>");
}
%>

<%
JspOut = out;

String s = "{regexp-1.2}";
reTest( s, "([a-z0-9]+)" );
reTest( s, "([a-z0-9-]+)" );
reTest( s, "([a-z0-9.]+)" );
reTest( s, "([a-z0-9.-]+)" );
reTest( s, "([a-z0-9\\-]+)" );
reTest( s, "([a-z0-9\\.]+)" );
reTest( s, "([a-z0-9\\.\\-]+)" );
reTest( s, "([a-z0-9.\\-]+)" );
reTest( s, "([a-z0-9\\.-]+)" );
%>




Re: Regexp broken

Posted by "Edward Q. Bridges" <eb...@argotec.de>.
thanks a lot for all of your efforts.

i'm sorry if i focussed on the wrong thing in your email, but as far as i am
concerned, the patterns are the issue.  there is a standardized syntax for
regular expressions such that regular expressions should be more or less
portable across implementations. 

furthermore, semantically identical regexp's should produce identical results
when applied to the same phrase.  regexp does not do that.  even more
disturbing is that sometimes a result works, and in a slightly different
context it doesn't.  in a word, it is inconsistent.

the bone i have to pick with it is that the jakarta project markets this as a
package that "has stood up quite well to the test of time" and that it is
intended as an answer to the question: "Why isn't there a decent regular
expression package available for 
Java under a BSD-Style (ie: Apache) license?" open source software doesn't
need to advertise itself.  it only needs to work.

the practical issue with this is that now i'm involved with a project that
has used regexp based on this hyperbole.  and now we have boneheaded
workarounds to otherwise simple problems.  

--e--




On Fri, 08 Jun 2001 12:22:33 -0700, Mike Dougherty wrote:

>
>"Edward Q. Bridges" wrote:
>> 
>> the patterns you provide in your sample *are* broken!  :)
>> 
>
>You are right. I was hoping you wouldn't focus on that, and just try to
>get your stuff working :-). Which would have given me time to actually
>look at the code and find out why this is the case. On the surface it
>does appear to be a bug. But then again maybe we are missing something
>(that is not clear in the docs). I have a hard time believing that
>something obvious hasn't been caught and addressed before now. I won't
>know for sure until I look at the source. Give me a day (or so) to look
>into it? I'll get back to you when I'm a little more knowledgeable about
>the specifics.
>
>/mike
>
>
>
>-- 
>******************************************
> Mike Dougherty -- Java Software Engineer
>******************************************

--------------------------------------------
<argo_tec gmbh>
     ed.q.bridges
     tel. 089-368179.xx
     fax 089-368179.79
     osterwaldstraße 10 
     (haus F eingang 21)
     80805 münchen
</argo_tec gmbh>
--------------------------------------------  



Re: Regexp broken

Posted by Mike Dougherty <Mi...@san.rr.com>.
"Edward Q. Bridges" wrote:
> 
> the patterns you provide in your sample *are* broken!  :)
> 

You are right. I was hoping you wouldn't focus on that, and just try to
get your stuff working :-). Which would have given me time to actually
look at the code and find out why this is the case. On the surface it
does appear to be a bug. But then again maybe we are missing something
(that is not clear in the docs). I have a hard time believing that
something obvious hasn't been caught and addressed before now. I won't
know for sure until I look at the source. Give me a day (or so) to look
into it? I'll get back to you when I'm a little more knowledgeable about
the specifics.

/mike



-- 
******************************************
 Mike Dougherty -- Java Software Engineer
******************************************

Re: Regexp broken

Posted by "Edward Q. Bridges" <eb...@argotec.de>.
the patterns you provide in your sample *are* broken!  :)

here is the results of my run of the sample program (i added numbers to
simplify discussing them):

1. Checking expression '([a-z0-9\-]+)'
   Found [regexp-1] at: 0
   Found [regexp-1] at: 1
2. Checking expression '([a-z0-9\.]+)'
   Found [regexp] at: 0
   Found [regexp] at: 1
3. Checking expression '([a-z0-9\.\-]+)'
   Found [regexp-1] at: 0
   Found [regexp-1] at: 1
4. Checking expression '([a-z0-9.\-]+)'
   Found [regexp-1] at: 0
   Found [regexp-1] at: 1
5. Checking expression '([a-z0-9\.-]+)'
   Found [{regexp] at: 0
   Found [{regexp] at: 1
6. Checking expression '\{([a-z0-9-]+)\}'
   Found [{regexp-1.2}] at: 0
   Found [regexp-1.2] at: 1

patterns 3, 4, & 5 are semantically equivalent -- putting the backslashes
there is unnecessary since special characters lose their "specialness" within
brackets.  and *yet* we get two different matches for the three patterns. 
further, each of the three patterns *should* match the entire string within
the brackets (according to the holy writ of regular expression syntax). 
Furthermore, the third pattern should absolutely not match a curly brace --
how is it even possible that the pattern matches a curly brace???

why does pattern number six match everything within the curly braces
(including the dot and the '2' after) when pattern 1 has a functionally
equivalent pattern that only matches up to (but not including the '.').

to demonstrate how i believe these patterns should behave, here are the
results from a perl script that uses these patterns on the same string.  i've
added numbers to correspond the patterns with the above.  (the script is
below):

1. matched using ([a-z0-9\-]+): regexp-1
2. matched using ([a-z0-9\.]+): regexp
3. matched using ([a-z0-9\.\-]+): regexp-1.2
4. matched using ([a-z0-9.\-]+): regexp-1.2
5. matched using ([a-z0-9\.-]+): regexp-1.2
6. no match using \{([a-z0-9-]+)\}

clearly (to me at least), regexp is lacking in consistency and rigorousness
to the language of regular expressions.

regards
--e--




On Fri, 08 Jun 2001 00:33:46 -0700, Mike Dougherty wrote:

>
>I have attached the test program I wrote. There are no comments and it's
>pretty ugly. But I think you'll get the idea.
>
>


#!/usr/bin/perl

use strict;

my $string = '{regexp-1.2}';

my @regexps = qw(
                 ([a-z0-9\-]+)
                 ([a-z0-9\.]+)
                 ([a-z0-9.-]+)
                 ([a-z0-9\.\-]+)
                 ([a-z0-9.\-]+)
                 ([a-z0-9\.-]+)
                 \{([a-z0-9-]+)\}
);

my $re;
for $re (@regexps) {
    if( $string =~ m#($re)# ) {
        print "matched using $re: $1\n";
    } else {
        print "no match using $re\n";
    }
}



__DATA__
Checking expression '([a-z0-9\-]+)'
   Found [regexp-1] at: 0
   Found [regexp-1] at: 1
Checking expression '([a-z0-9\.]+)'
   Found [regexp] at: 0
   Found [regexp] at: 1
Checking expression '([a-z0-9\.\-]+)'
   Found [regexp-1] at: 0
   Found [regexp-1] at: 1
Checking expression '([a-z0-9.\-]+)'
   Found [regexp-1] at: 0
   Found [regexp-1] at: 1
Checking expression '([a-z0-9\.-]+)'
   Found [{regexp] at: 0
   Found [{regexp] at: 1
Checking expression '\{([a-z0-9-]+)\}'
   Found [{regexp-1.2}] at: 0
   Found [regexp-1.2] at: 1
--------------------------------------------
<argo_tec gmbh>
     ed.q.bridges
     tel. 089-368179.xx
     fax 089-368179.79
     osterwaldstraße 10 
     (haus F eingang 21)
     80805 münchen
</argo_tec gmbh>
--------------------------------------------  



Re: Regexp broken

Posted by Mike Dougherty <Mi...@san.rr.com>.
Edwin Martin wrote:
> 
> According to the Jakarta website, regexp "stood up quite well
> to the test of time".
> 
> Is this true?
> 
> I made a JSP page with strange results.
> 
> Assume I have a string like "{regexp-1.2}" and I want to match
> everything between the brackets (thus "regexp-1.2").
> 
> Let's try some regular expressions:
> 
> Input string "{regexp-1.2}" and RE "([a-z0-9]+)" match: "regexp"
> Input string "{regexp-1.2}" and RE "([a-z0-9-]+)" match: "{regexp-1.2}"
> Input string "{regexp-1.2}" and RE "([a-z0-9.]+)" match: "regexp"
> Input string "{regexp-1.2}" and RE "([a-z0-9.-]+)" match: "{regexp"
> Input string "{regexp-1.2}" and RE "([a-z0-9\-]+)" match: "regexp-1"
> Input string "{regexp-1.2}" and RE "([a-z0-9\.]+)" match: "regexp"
> Input string "{regexp-1.2}" and RE "([a-z0-9\.\-]+)" match: "regexp-1"
> Input string "{regexp-1.2}" and RE "([a-z0-9.\-]+)" match: "regexp-1"
> Input string "{regexp-1.2}" and RE "([a-z0-9\.-]+)" match: "{regexp"
> 
> No one of them gives "regexp-1.2"!
> 
> What should the regular expression be??
> 
> Of course we can use "[^{}]+", but that's not wat I want here,
> the brackets can be any non-[a-z0-9.-].
> 
> Or is regexp broken and does nobody have the courage to admit it?
> 
> Edwin Martin.
> 

Edwin,

The regexp seems to work fine. We just need to tune your expressions. I
tested your expressions and got your same results. So we know we both
have working code (or the same broken code :-).

Can you count on the brackets being in the string ("{regexp-1.2}")?
 If so that will make it easier. If not it's still doable, it'll just
take a little more work. Here are two expressions I came up with
assuming that the brackets will always be in the string:

    "\{([a-z0-9-]+)\}"
    "\{(\S+)\}"

The first one matches the character class as you have in yours. The
second one matches non-whitespace chars, not recommended if you will get
something like "{regexp 1.2}" as the string. but highly recommended if
you stuff with special characters.

However, if the brackets are not guaranteed to be there but you always
want to trim the first and last character you can use the following:

    "^.([a-z0-9-]+).$",
    "^.(\S+).$",

These are the same as above with two noticable differences. The ^ and $
tell it to match at the beginning (^) and end ($) of the string only!
The dot (.) tells it to match any character (including whitespace).

Also, notice that I put the characters that I want excluded from the
match outside the parens. This separates the matched pattern into a
subexpression. Which means that you will need to get a match at a
certain index. So to get the appropriate string out of the RE object I
called the getParen() method passing it 1 as the index.

    String result = regexp.getParen( 1 );

I have attached the test program I wrote. There are no comments and it's
pretty ugly. But I think you'll get the idea.


/mike



-- 
******************************************
 Mike Dougherty -- Java Software Engineer
******************************************

RE: Regexp broken

Posted by Seth van buren <se...@eserv.com.au>.
Well, well, seems as though the only way anything will get done on this
project is to start a flame war on slash dot.

Who wants to through the first punch.

Regards
Seth

-----Original Message-----
From: Edwin Martin [mailto:edwin@bitstorm.nl]
Sent: Tuesday, 12 June 2001 5:47 AM
To: regexp-user@jakarta.apache.org
Subject: Re: Regexp broken


/Mike wrote:

>Seemed to be working when I went there. Are you going to the right place
>(http://nagoya.apache.org/bugzilla)?

I submitted the bug.

I'm afraid nothing will happen. There are eight bugs submitted
and none of them is assigned.

Well, it's free software, so I'm not allowed to expect anything :-(

http://nagoya.betaversion.org/bugzilla/buglist.cgi?product=Regexp

Bye,
Edwin Martin.

BTW
please remember with e-mail it's easy to get unintentionally offended.



Re: Regexp broken

Posted by Jon Stevens <jo...@latchkey.com>.
on 6/11/01 12:46 PM, "Edwin Martin" <ed...@bitstorm.nl> wrote:

> please remember with e-mail it's easy to get unintentionally offended.

Especially if the original person didn't intend to offend someone.

-jon


Re: Regexp broken

Posted by Edwin Martin <ed...@bitstorm.nl>.
/Mike wrote:

>Seemed to be working when I went there. Are you going to the right place
>(http://nagoya.apache.org/bugzilla)?

I submitted the bug.

I'm afraid nothing will happen. There are eight bugs submitted
and none of them is assigned.

Well, it's free software, so I'm not allowed to expect anything :-(

http://nagoya.betaversion.org/bugzilla/buglist.cgi?product=Regexp

Bye,
Edwin Martin.

BTW
please remember with e-mail it's easy to get unintentionally offended.


Re: Regexp broken

Posted by Mike Dougherty <Mi...@san.rr.com>.
Seth van buren wrote:
> 
> Edwin,
> I think it's broken.  I am using the \w tag in my regular expresion.  It
> woin't match something with an underscore (_) in it.  I have tried it in the
> test suite and it does not work.  The javadoc says it should.
> 

Post your examples and we'll take a look. 


> I would have submitted a bug already, but Bugzilla has been unavailable for
> sometime now.
> 
> Anyone else had similar experiences?
> 

Seemed to be working when I went there. Are you going to the right place
(http://nagoya.apache.org/bugzilla)?


/mike

-- 
******************************************
 Mike Dougherty -- Java Software Engineer
******************************************

Re: Regexp broken

Posted by Mike Dougherty <Mi...@san.rr.com>.
Jon Stevens wrote:
> 
> on 6/10/01 9:58 PM, "Mike Dougherty" <Mi...@san.rr.com> wrote:
> 
> > I can safely recommend it without getting
> > chewed out for recommending older code.
> 
> You were not chewed out...just corrected...

I didn't say you "chewed me out", I didn't say
anyone chewed me out. What I did say, and intended
to imply was that if my solution to the problem
was to bail on Jakarta regexp and use GNU (which I
knew worked) I would have been "chewed out". Maybe
not by you, but I'm sure by someone. I am kind of
adverse to recommending a "competing" product on
this list (unless I can't avoid it), and I am sure
there maybe a few others on this list that feel
the same way.

> 
> Sigh, if you can't take being corrected...

I can take being corrected. However, the tone of
your message wasn't said as a mere correction. I
sounded more like you were calling me a liar. I
made a mistake, for which I will admit to not
having done adequate research. I did not
intentionally attempt to defame the Jakarta Regexp
name or spread misinformation.

I was doing my best to help solve the problem(s)
and answer questions. Being that I have only been
a member of this list for a short time, I am bound
to make mistakes. What I found very irritating,
and thus my irritated responses, the only input we
were able to get from one of the more experienced
people on the list was to correct my mistakes. It
might have been more helpful had you corrected my
mistakes *and* proposed a solution to the
problems.

/mike


-- 
******************************************
 Mike Dougherty -- Java Software Engineer
******************************************

Re: Regexp broken

Posted by Jon Stevens <jo...@latchkey.com>.
on 6/10/01 9:58 PM, "Mike Dougherty" <Mi...@san.rr.com> wrote:

> I can safely recommend it without getting
> chewed out for recommending older code.

You were not chewed out...just corrected...

Sigh, if you can't take being corrected...

-jon


Re: Regexp broken

Posted by Mike Dougherty <Mi...@san.rr.com>.
Jon Stevens wrote:
> 
> on 6/8/01 12:49 AM, "Mike Dougherty" <Mi...@san.rr.com> wrote:
> 
> > GNU regexp is Jakarta Regexp. GNU still has a link to the old version
> > from their Java site. But the package has been moved under the Apache
> > umbrella.
> 
> I'm sorry, but that is not true at all.
> 
> Jakarta Regexp is NOT GNU regexp.
> 

I know the Regexp page attributes Jonathan Locke
for donating Regexp. "donated to the Apache
Software Foundation by Jonathan Locke." And I
wasn't trying to take anything away from the
original author(s). I just thought I remembered
the GNU package being done by the same author. 

The original question was:

> > also, does anyone have an opinion on gnu regex
> > or oro regex as a suitable alternative?


I've been using the GNU regexp for years, and I
know it works. Since no of us seem to be able to
get this one working. I'd have to say try ORO, if
that's to complicated then get the GNU regexp.
Since we are now aware that they are not the same
code I can safely recommend it without getting
chewed out for recommending older code.

The web address:
http://www.cacas.org/java/gnu/regexp

/mike


-- 
******************************************
 Mike Dougherty -- Java Software Engineer
******************************************

Re: Regexp broken

Posted by Jon Stevens <jo...@latchkey.com>.
on 6/8/01 12:49 AM, "Mike Dougherty" <Mi...@san.rr.com> wrote:

> GNU regexp is Jakarta Regexp. GNU still has a link to the old version
> from their Java site. But the package has been moved under the Apache
> umbrella.

I'm sorry, but that is not true at all.

Jakarta Regexp is NOT GNU regexp.

> I don't know about ORO. I do know that there is an ORO under Jakarta
> also. But whether they are one in the same I do not know.

They are.

-jon

-- 
"Open source is not available to commercial companies."
            -Steve Ballmer, CEO Microsoft
<http://www.suntimes.com/output/tech/cst-fin-micro01.html>


Re: Regexp broken

Posted by "Edward Q. Bridges" <eb...@argotec.de>.
here's my earlier post:
http://marc.theaimsgroup.com/?l=jakarta-regexp-user&m=99130971224497&w=2



On Fri, 08 Jun 2001 00:49:46 -0700, Mike Dougherty wrote:

>If that doesn't help you
>solve yours post your message (since it looks like it was previous to my
>joining) and we'll take a look.
>
>/mike

--------------------------------------------
<argo_tec gmbh>
     ed.q.bridges
     tel. 089-368179.xx
     fax 089-368179.79
     osterwaldstraße 10 
     (haus F eingang 21)
     80805 münchen
</argo_tec gmbh>
--------------------------------------------  



Re: Regexp broken

Posted by Mike Dougherty <Mi...@san.rr.com>.
"Edward Q. Bridges" wrote:
> 
> likewise, i've come across issues with regexp (i just joined this list about
> a week ago because i came across a problem -- see my posts from a few days
> ago).  is anyone actively maintaining the package?

I've only been on the list a few days myself. But I have to assume so.
It hasn't been a few active list (which surprised me) though.

> 
> also, does anyone have an opinion on gnu regex or oro regex as a suitable
> alternative?

GNU regexp is Jakarta Regexp. GNU still has a link to the old version
from their Java site. But the package has been moved under the Apache
umbrella.

I don't know about ORO. I do know that there is an ORO under Jakarta
also. But whether they are one in the same I do not know.

Take a look at my answer to Edwin's trouble. If that doesn't help you
solve yours post your message (since it looks like it was previous to my
joining) and we'll take a look.

/mike


-- 
******************************************
 Mike Dougherty -- Java Software Engineer
******************************************

RE: Regexp broken

Posted by "Edward Q. Bridges" <eb...@argotec.de>.
likewise, i've come across issues with regexp (i just joined this list about
a week ago because i came across a problem -- see my posts from a few days
ago).  is anyone actively maintaining the package?

also, does anyone have an opinion on gnu regex or oro regex as a suitable
alternative?

regards
--e--


On Sat, 18 Aug 2001 10:05:56 +1000, Seth van buren wrote:

>Edwin,
>I think it's broken.  I am using the \w tag in my regular expresion.  It
>woin't match something with an underscore (_) in it.  I have tried it in the
>test suite and it does not work.  The javadoc says it should.
>
>I would have submitted a bug already, but Bugzilla has been unavailable for
>sometime now.
>
>Anyone else had similar experiences?
>
>Regards
>Seth
>
>
>-----Original Message-----
>From: Edwin Martin [mailto:edwin@bitstorm.nl]
>Sent: Friday, 8 June 2001 9:24 AM
>To: regexp-user@jakarta.apache.org
>Subject: Regexp broken
>
>
>According to the Jakarta website, regexp "stood up quite well
>to the test of time".
>
>Is this true?
>
>I made a JSP page with strange results.
>
>Assume I have a string like "{regexp-1.2}" and I want to match
>everything between the brackets (thus "regexp-1.2").
>
>Let's try some regular expressions:
>
>Input string "{regexp-1.2}" and RE "([a-z0-9]+)" match: "regexp"
>Input string "{regexp-1.2}" and RE "([a-z0-9-]+)" match: "{regexp-1.2}"
>Input string "{regexp-1.2}" and RE "([a-z0-9.]+)" match: "regexp"
>Input string "{regexp-1.2}" and RE "([a-z0-9.-]+)" match: "{regexp"
>Input string "{regexp-1.2}" and RE "([a-z0-9\-]+)" match: "regexp-1"
>Input string "{regexp-1.2}" and RE "([a-z0-9\.]+)" match: "regexp"
>Input string "{regexp-1.2}" and RE "([a-z0-9\.\-]+)" match: "regexp-1"
>Input string "{regexp-1.2}" and RE "([a-z0-9.\-]+)" match: "regexp-1"
>Input string "{regexp-1.2}" and RE "([a-z0-9\.-]+)" match: "{regexp"
>
>No one of them gives "regexp-1.2"!
>
>What should the regular expression be??
>
>Of course we can use "[^{}]+", but that's not wat I want here,
>the brackets can be any non-[a-z0-9.-].
>
>Or is regexp broken and does nobody have the courage to admit it?
>
>Edwin Martin.
>
>
>BTW This is the JSP-code, so you can try it yourself:
>
><%@ page import="org.apache.regexp.*" %>
>
><%!
>JspWriter JspOut;
>
>public void reTest( String in, String re ) throws java.io.IOException,
>org.apache.regexp.RESyntaxException {
>         JspOut.print( "Input string \""+in+"\" and RE \""+re+"\" match: ");
>         RE testRe = new RE(re);
>         if ( testRe.match( in ) )
>                 JspOut.print( "\""+testRe.getParen(1)+"\"" );
>         else
>                 JspOut.print( "no match" );
>         JspOut.print("<br>");
>}
>%>
>
><%
>JspOut = out;
>
>String s = "{regexp-1.2}";
>reTest( s, "([a-z0-9]+)" );
>reTest( s, "([a-z0-9-]+)" );
>reTest( s, "([a-z0-9.]+)" );
>reTest( s, "([a-z0-9.-]+)" );
>reTest( s, "([a-z0-9\\-]+)" );
>reTest( s, "([a-z0-9\\.]+)" );
>reTest( s, "([a-z0-9\\.\\-]+)" );
>reTest( s, "([a-z0-9.\\-]+)" );
>reTest( s, "([a-z0-9\\.-]+)" );
>%>
>
>
>
>

--------------------------------------------
<argo_tec gmbh>
     ed.q.bridges
     tel. 089-368179.xx
     fax 089-368179.79
     osterwaldstraße 10 
     (haus F eingang 21)
     80805 münchen
</argo_tec gmbh>
--------------------------------------------  



RE: Regexp broken

Posted by Seth van buren <se...@eserv.com.au>.
Edwin,
I think it's broken.  I am using the \w tag in my regular expresion.  It
woin't match something with an underscore (_) in it.  I have tried it in the
test suite and it does not work.  The javadoc says it should.

I would have submitted a bug already, but Bugzilla has been unavailable for
sometime now.

Anyone else had similar experiences?

Regards
Seth


-----Original Message-----
From: Edwin Martin [mailto:edwin@bitstorm.nl]
Sent: Friday, 8 June 2001 9:24 AM
To: regexp-user@jakarta.apache.org
Subject: Regexp broken


According to the Jakarta website, regexp "stood up quite well
to the test of time".

Is this true?

I made a JSP page with strange results.

Assume I have a string like "{regexp-1.2}" and I want to match
everything between the brackets (thus "regexp-1.2").

Let's try some regular expressions:

Input string "{regexp-1.2}" and RE "([a-z0-9]+)" match: "regexp"
Input string "{regexp-1.2}" and RE "([a-z0-9-]+)" match: "{regexp-1.2}"
Input string "{regexp-1.2}" and RE "([a-z0-9.]+)" match: "regexp"
Input string "{regexp-1.2}" and RE "([a-z0-9.-]+)" match: "{regexp"
Input string "{regexp-1.2}" and RE "([a-z0-9\-]+)" match: "regexp-1"
Input string "{regexp-1.2}" and RE "([a-z0-9\.]+)" match: "regexp"
Input string "{regexp-1.2}" and RE "([a-z0-9\.\-]+)" match: "regexp-1"
Input string "{regexp-1.2}" and RE "([a-z0-9.\-]+)" match: "regexp-1"
Input string "{regexp-1.2}" and RE "([a-z0-9\.-]+)" match: "{regexp"

No one of them gives "regexp-1.2"!

What should the regular expression be??

Of course we can use "[^{}]+", but that's not wat I want here,
the brackets can be any non-[a-z0-9.-].

Or is regexp broken and does nobody have the courage to admit it?

Edwin Martin.


BTW This is the JSP-code, so you can try it yourself:

<%@ page import="org.apache.regexp.*" %>

<%!
JspWriter JspOut;

public void reTest( String in, String re ) throws java.io.IOException,
org.apache.regexp.RESyntaxException {
         JspOut.print( "Input string \""+in+"\" and RE \""+re+"\" match: ");
         RE testRe = new RE(re);
         if ( testRe.match( in ) )
                 JspOut.print( "\""+testRe.getParen(1)+"\"" );
         else
                 JspOut.print( "no match" );
         JspOut.print("<br>");
}
%>

<%
JspOut = out;

String s = "{regexp-1.2}";
reTest( s, "([a-z0-9]+)" );
reTest( s, "([a-z0-9-]+)" );
reTest( s, "([a-z0-9.]+)" );
reTest( s, "([a-z0-9.-]+)" );
reTest( s, "([a-z0-9\\-]+)" );
reTest( s, "([a-z0-9\\.]+)" );
reTest( s, "([a-z0-9\\.\\-]+)" );
reTest( s, "([a-z0-9.\\-]+)" );
reTest( s, "([a-z0-9\\.-]+)" );
%>