You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oro-dev@jakarta.apache.org by bu...@apache.org on 2005/11/07 14:11:08 UTC
DO NOT REPLY [Bug 37382] New: -
stack over flow while using a Regex
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=37382>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=37382
Summary: stack over flow while using a Regex
Product: ORO
Version: 2.0.7
Platform: Other
OS/Version: other
Status: NEW
Severity: normal
Priority: P2
Component: Main
AssignedTo: oro-dev@jakarta.apache.org
ReportedBy: hi_pkr@yahoo.com
CC: hi_pkr@yahoo.com
Hi,
I am using ORO Regex API version 2.0.7 and my objective is to extract some
tagged data from html source. For example i am interested in getting the source
code for all the forms found in a html page. So i made my regex like this:
Regex formReg = new Regex("(?i)(<form(.|\\s)*?>(.|\\s)*?</form>)");
because following one didn't work,
Regex formReg = new Regex("(?i)(<form.*?>.*?</form>)");
because . is taken as any character but not newline.
So my first regex worked well and i was able to get complete form data starting
from <form..... to </form>
BUT
when the form was big say like it had around 400 lines and 30K bytes then it
failed and resulted in Stack Overflow. I am pasting below the stack overflow
error:
Matched <form name="param" action="http://www/parametric/ProductParametric"
method="post">
<input name="sterm" type="hidden">
</form>
matcher.getMatch().endOffset(1) 4480
Matched <form name="cross" action="http://www/crossref/search.jsp"
method="post">
<input name="partNumber" type="hidden">
</form>
matcher.getMatch().endOffset(1) 127
java.lang.StackOverflowError
at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
Also i am pasting my code(method) which i wrote for extraction and it can be
simply called from main method and run,
----------------------------------------------------------------------------
public static void testRegOro() {
try {
String html = IoUtils.readFile("file.txt");
// String html = "all work and no play makes jack a dull
boy"; //IoUtils.readFile("file.txt");
Perl5Compiler compiler=new Perl5Compiler();
Perl5Pattern pattern = (Perl5Pattern) compiler.compile
("(<form(.|\\s)*?>(.|\\s)*?</form>)",
Perl5Compiler.CASE_INSENSITIVE_MASK |
Perl5Compiler.READ_ONLY_MASK);
PatternMatcher matcher = new Perl5Matcher();
int i=0;
while(matcher.contains(html,pattern) && i++<3) {
System.out.println("Matched " + matcher.getMatch().group
(1));
System.out.println("matcher.getMatch().endOffset(1) " +
matcher.getMatch().endOffset(1));
html = html.substring(matcher.getMatch().endOffset(1));
//System.out.println("html " + html);
}
} catch (Throwable e) {
e.printStackTrace();
}
}
------------------------------------------------------------------------------
As my code shows i am reading a file.txt file i am attaching that file also in
the bug.
I will really appreciate if you can look into this and throw some light on this
and if it can be improved?
Thanks in Advance!
Regards,
Pushpesh Kr. Rajwanshi
--
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
---------------------------------------------------------------------
To unsubscribe, e-mail: oro-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: oro-dev-help@jakarta.apache.org