You are viewing a plain text version of this content. The canonical link for it is here.
Posted to regexp-user@jakarta.apache.org by Millington Roger <ro...@barclaycard.co.uk> on 2001/06/04 18:09:02 UTC
Bug or feature?
Hi All,
I have been comparing 'gnu regex', 'oro regex' and 'regexp' and have found a
'difference' between the way the 'regexp' treats the regular expression "^
*[1-9][0-9]{0,2} *$"
and the way the others treat it. The example is -
import org.apache.regexp.*;
import org.apache.oro.text.regex.*;
public class One
{
public static void main(String args[])
{
try
{
// I would expect this to allow 1 to 3 digits starting with a
digit != '0'
String re = "^ *[1-9][0-9]{0,2} *$";
{
RE m = new RE(re);
if (m.match(" 102 "))
{
System.out.println("1 OK");
}
else
{
System.out.println("1 !0K");
}
if (m.match(" 12 "))
{
System.out.println("2 OK");
}
else
{
System.out.println("2 !OK");
}
if (m.match(" 1 "))
{
System.out.println("3 OK");
}
else
{
System.out.println("3 !OK");
}
}
{
PatternCompiler pc = new Perl5Compiler();
PatternMatcher pm = new Perl5Matcher();
Pattern pattern = pc.compile(re);
if (pm.matches(" 102 ", pattern))
{
System.out.println("1 OK");
}
else
{
System.out.println("1 !0K");
}
if (pm.matches(" 12 ", pattern))
{
System.out.println("2 OK");
}
else
{
System.out.println("2 !OK");
}
if (pm.matches(" 1 ", pattern))
{
System.out.println("3 OK");
}
else
{
System.out.println("3 !OK");
}
}
} catch (Exception e)
{
System.err.println("Problem, exception = " + e);
e.printStackTrace(System.err);
}
}
}
Both GNU and ORO produce OK for all three test but 'regexp' produces !OK for
the 3rd test.
Is this a bug or just a feature?
Roger Millington
Legal Disclaimer:-
Please be aware that messages sent over
the Internet may not be secure and should
not be seen as forming a legally binding
contract unless otherwise stated.
Re: Bug or feature?
Posted by "Edward Q. Bridges" <eb...@argotec.de>.
there is some ambiguity in the regexp that you provide. the ^ * will match
any character, _including_ a number if the regexp engine is "greedy." i
think you want the regexp to be something more like this:
String re = "^[\D]*[1-9][0-9]{0,2} *$";
which is basically saying: "match zero or more characters at the beginning of
the line that are not digits," etc.
your original regexp is saying: "match zero or more characters at the
beginningof the line followed by a number." when you have one character in
the string to be matched (which happens to be a number), that is matched by
the "^ *" portion of your regexp which is still expecting there to be one
digit in the range 1-9 following.
the "greediness" of a regexp engine is implementation specific. there is no
requirement for an engine to be more or less greedy. so what you encountered
is not so much a bug nor a feature in any of the engines; rather, you found
an idiosyncracy which differentiates them.
HTH
--e--
On Mon, 4 Jun 2001 17:09:02 +0100, Millington Roger wrote:
> // I would expect this to allow 1 to 3 digits starting with a >digit != '0'
> String re = "^ *[1-9][0-9]{0,2} *$";
--------------------------------------------
<argo_tec gmbh>
ed.q.bridges
tel. 089-368179.xx / fax 089-368179.79
osterwaldstrasse 10 / 80805 muenchen
</argo_tec gmbh>
--------------------------------------------