You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oro-user@jakarta.apache.org by Martin Thomas <ma...@scarceskills.com> on 2003/02/10 16:05:08 UTC

Stack Overflow Problem

Hello,

I'm using ORO 2.0.7 and I get a stack overflow exception with the following:

import org.apache.oro.text.regex.*;

public final class testcase {
public static final void main(String args[])
{
    String expression = "(\\(|\\)|^| |,|\\.|;)Baseline(.)*(\\(|\\)| |,|\\.|;|$)";
    String matchString = "this is a very large string " +
							"..." +
							"that has been cut-out for email purposes";
	Pattern pattern   = null;
    PatternMatcherInput input;
    PatternCompiler compiler;
    PatternMatcher matcher;
    MatchResult result;

    // Create Perl5Compiler and Perl5Matcher instances.
    compiler = new Perl5Compiler();
    matcher  = new Perl5Matcher();

    try {
      pattern = compiler.compile(expression, Perl5Compiler.CASE_INSENSITIVE_MASK);
    } catch(MalformedPatternException e)
    {
      System.err.println("Bad pattern.");
      System.err.println(e.getMessage());
      System.exit(1);
    }
    System.out.println(matcher.contains(matchString, pattern));
	}
}

When I run this, I get the following error message (JDK 1.3):

Exception in thread "main" java.lang.StackOverflowError
        at java.util.Stack.push(Stack.java:47)
        at org.apache.oro.text.regex.Perl5Matcher.__pushState(Unknown Source)
        at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
        at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
        at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
        at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
        at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
        at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
        at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
        at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
        at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
        at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
        at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
        at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
... (repeated lot's of times)
        at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)


Any help / advice would be appreciated.

Thanks

Martin.

---------------------------------------------------------------------
To unsubscribe, e-mail: oro-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: oro-user-help@jakarta.apache.org


Re: Stack Overflow Problem

Posted by Martin Thomas <ma...@scarceskills.com>.
Daniel,

Thanks for your help with this.

Martin.

On 10 Feb 2003 at 13:15, Daniel F. Savarese wrote:

> 
> In message <3E...@localhost>, "Martin Thomas" writes:
> >OK, I've attached a test case that demonstrates the problem.
> 
> Thanks.  Your original expression works just fine if you change (.)* to
> (.*), even with the alternations and saved groups.  However, alternations
> tend to be inefficient and should be replaced with character classes
> whenever possible.  Contrary to my original assumption, the problem appears
> to have had nothing to do with backtracking.  I don't think this is
> a case of "blame the regular expression" as I had initially concluded.
> There are situations where a Perl regex (at least a 5.003 regex) can
> match the empty string an infinite  number of times.  That shouldn't
> happen with (.)*, but neither should there be a problem with too many
> saved groups being pushed onto the stack since there is only one.
> I'm going to file this away as a test case and look into it later.
> 
> daniel
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: oro-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: oro-user-help@jakarta.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: oro-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: oro-user-help@jakarta.apache.org


Re: Stack Overflow Problem

Posted by Martin Thomas <ma...@scarceskills.com>.
OK, I've attached a test case that demonstrates the problem.

Interestingly, when I remove the parentheses as you suggest, all works OK.

Thanks,

Martin.

On 10 Feb 2003 at 12:51, Daniel F. Savarese wrote:

> 
> In message <3E...@localhost>, "Martin Thomas" writes:
> >Actually, exactly the same occurs using:
> >
> >String expression = "Baseline(.)*";
> 
> Er, how long is the string you're matching against?  It really helps
> if you include the exact input you're using so that others can
> reproduce your problem.  I shouldn't have to spend my time constructing
> arbitrarily large inputs with different permutations of strings (e.g.,
> containing Baseline, not containing it, containing thousands of characters
> before a newline, not, etc.).  Which, by the way, I just did.
> 
> When you don't need capturing parentheses, don't use them.  In
> other words:
>    Baseline(?:.)*
> And when you don't need grouping, don't use it.  In other words:
>    Baseline.*
> 
> As I said before, Perl regular expressions are miniature programs ...
> 
> daniel
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: oro-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: oro-user-help@jakarta.apache.org
> 


Re: Stack Overflow Problem

Posted by Martin Thomas <ma...@scarceskills.com>.
Hi,

Actually, exactly the same occurs using:

String expression = "Baseline(.)*";

IMHO, I can't see why this should cause an infinite loop, blowing the stack.

Any more ideas?

Martin.



On 10 Feb 2003 at 11:43, Daniel F. Savarese wrote:

> 
> In message <3E...@localhost>, "Martin Thomas" writes:
> >I'm using ORO 2.0.7 and I get a stack overflow exception with the following:
> ....
> >String expression = "(\\(|\\)|^| |,|\\.|;)Baseline(.)*(\\(|\\)| |,|\\.|;|$
> ....
> >Any help / advice would be appreciated.
> 
> Use character classes instead of grouped alternations.  Perl patterns are
> miniature programs.  Being NFAs, it is just as possible to write a Perl
> regex that backtracks too heavily or creates an infinite loop, busting
> the stack, as it is in a programming language.
> 
> daniel
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: oro-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: oro-user-help@jakarta.apache.org
> 


Regards,

Martin.

---------------------------------------------------------------------------
Martin Thomas                 | URL   : http://www.scarceskills.com
Technical Director            | Phone : +44 (0)771 258 7633
Scarce Skills Ltd.            | Mailto: martin.thomas@scarceskills.com
---------------------------------------------------------------------------
DISCLAIMER
This e-mail message and any attachments are confidential and may also be a
privileged communication. It is intended solely for the person(s) to whom
it is addressed. If you are not the intended addressee of the message you
must take no action based on it. Please reply to this message to let us
know you received it in error and also delete the message from your system.
Internet e-mails are not necessarily secure and you should be mindful of
this when e-mailing us.

The contents of an attachment to this e-mail may contain software viruses
which could damage your own computer system. Whilst we have taken every
reasonable precaution to minimise this risk, we cannot accept liability for
any damage which you sustain as a result of software viruses. You should
carry out your own virus checks before opening the attachment.
---------------------------------------------------------------------------

---------------------------------------------------------------------
To unsubscribe, e-mail: oro-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: oro-user-help@jakarta.apache.org