You are viewing a plain text version of this content. The canonical link for it is here.
Posted to regexp-user@jakarta.apache.org by Millington Roger <ro...@barclaycard.co.uk> on 2001/06/04 18:09:02 UTC

Bug or feature?

Hi All,

I have been comparing 'gnu regex', 'oro regex' and 'regexp' and have found a
'difference' between the way the 'regexp' treats the regular expression "^
*[1-9][0-9]{0,2} *$"
and the way the others treat it. The example is -

import org.apache.regexp.*;
import org.apache.oro.text.regex.*;

public class One
{
    public static void main(String args[])
    {
        try
        {
            // I would expect this to allow 1 to 3 digits starting with a
digit != '0'
            String re = "^ *[1-9][0-9]{0,2} *$";
            
            {                
                RE m = new RE(re);
                if (m.match("  102 "))
                {
                    System.out.println("1 OK");
                }
                else
                {
                    System.out.println("1 !0K");
                }
                if (m.match("    12 "))
                {
                    System.out.println("2 OK");
                }
                else
                {
                    System.out.println("2 !OK");
                }
                if (m.match("    1 "))
                {
                    System.out.println("3 OK");
                }
                else
                {
                    System.out.println("3 !OK");
                }
            }
            
            {
                PatternCompiler pc = new Perl5Compiler();
                PatternMatcher pm = new Perl5Matcher();
                Pattern pattern = pc.compile(re);
                if (pm.matches("  102 ", pattern))
                {
                    System.out.println("1 OK");
                }
                else
                {
                    System.out.println("1 !0K");
                }
                if (pm.matches("    12 ", pattern))
                {
                    System.out.println("2 OK");
                }
                else
                {
                    System.out.println("2 !OK");
                }
                if (pm.matches("    1 ", pattern))
                {
                    System.out.println("3 OK");
                }
                else
                {
                    System.out.println("3 !OK");
                }
            }
        } catch (Exception e)
        {
            System.err.println("Problem, exception = " + e);
            e.printStackTrace(System.err);
        }
    }    
}

Both GNU and ORO produce OK for all three test but 'regexp' produces !OK for
the 3rd test.

Is this a bug or just a feature?

Roger Millington





Legal Disclaimer:-

Please be aware that messages sent over
the Internet may not be secure and should
not be seen as forming a legally binding
contract unless otherwise stated.

Re: Bug or feature?

Posted by "Edward Q. Bridges" <eb...@argotec.de>.
there is some ambiguity in the regexp that you provide.  the ^ * will match
any character, _including_ a number if the regexp engine is "greedy."  i
think you want the regexp to be something more like this:
  String re = "^[\D]*[1-9][0-9]{0,2} *$";
which is basically saying: "match zero or more characters at the beginning of
the line that are not digits," etc.

your original regexp is saying: "match zero or more characters at the
beginningof the line followed by a number."  when you have one character in
the string to be matched (which happens to be a number), that is matched by
the "^ *" portion of your regexp which is still expecting there to be one
digit in the range 1-9 following.

the "greediness" of a regexp engine is implementation specific.  there is no
requirement for an engine to be more or less greedy.  so what you encountered
is not so much a bug nor a feature in any of the engines; rather, you found
an idiosyncracy which differentiates them.

HTH
--e--


On Mon, 4 Jun 2001 17:09:02 +0100, Millington Roger wrote:
>            // I would expect this to allow 1 to 3 digits starting with a >digit != '0'
>            String re = "^ *[1-9][0-9]{0,2} *$";

--------------------------------------------
<argo_tec gmbh>
     ed.q.bridges
     tel. 089-368179.xx / fax 089-368179.79
     osterwaldstrasse 10 / 80805 muenchen
</argo_tec gmbh>
--------------------------------------------