You are viewing a plain text version of this content. The canonical link for it is here.
Posted to regexp-dev@jakarta.apache.org by bu...@apache.org on 2003/04/06 21:49:44 UTC

DO NOT REPLY [Bug 18741] New: - StackOverflowException occures on certain match strings

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=18741>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=18741

StackOverflowException occures on certain match strings

           Summary: StackOverflowException occures on certain match strings
           Product: Regexp
           Version: unspecified
          Platform: PC
        OS/Version: Windows NT/2K
            Status: NEW
          Severity: Blocker
          Priority: Other
         Component: Other
        AssignedTo: regexp-dev@jakarta.apache.org
        ReportedBy: b.lichtl@cocosoftware.com
                CC: b.lichtl@cocosoftware.com


Hi!

i am trying to write an app. that recognizes web pages, and i try to use regexp 
for this. In the example below i have matched the start page of google 
(www.google.com) against the regexp: (.|\n)*Google(.|\n)*.

("(.|\n)+title(.|\n)+Google(.|\n)+title(.|\n)+" on the same string produces the 
same error, while it works on test strings, too.)

Although the same regexp works with self-written examples, for the webpage it 
produces an exception. Webpage and exception trace can be seen below.

JVM: Sun JDK 1.4.1
Win2k SP2

Matched StackOverflowError DUMP: (.|\n)*Google(.|\n)* Against: 
==========================

<html><head><meta http-equiv="content-type" content="text/html; charset=UTF-
8"><title>Google</title><style><!--
body,td,a,p,.h{font-family:arial,sans-serif;}
.h{font-size: 20px;}
.q{text-decoration:none; color:#0000cc;}
//-->
</style>
<script>
<!--
function sf(){document.f.q.focus();}
function c(p,l,e){var f=document.f;if (f.action && document.getElementById) 
{var hf=document.getElementById("hf");if (hf) {var t = "<input type=hidden 
name=tab value="+l+">";hf.innerHTML=t;}f.action 
= 'http://'+p;e.cancelBubble=true;f.submit();return false;}return true;}
// -->
</script>
</head><body bgcolor=#ffffff text=#000000 link=#0000cc vlink=#551a8b 
alink=#ff0000 onLoad=sf()><center><table border=0 cellspacing=0 
cellpadding=0><tr><td><img src="/images/logo.gif" width=276 height=110 
alt="Google"></td></tr></table><br>
<table border=0 cellspacing=0 cellpadding=0><tr><td width=15>&nbsp;</td><td 
id=0 bgcolor=#3366cc align=center width=95 nowrap><font color=#ffffff size=-
1><b>Web</b></font></td><td width=15>&nbsp;</td><td id=1 bgcolor=#efefef 
align=center width=95 nowrap onClick="return c
('www.google.com/imghp','wi',event);" style=cursor:pointer;cursor:hand;><a 
id=1a class=q href="/imghp?hl=en&tab=wi&ie=UTF-8&oe=UTF-8" onClick="return c
('www.google.com/imghp','wi',event);"><font size=-1>Images</font></a></td><td 
width=15>&nbsp;</td><td id=2 bgcolor=#efefef align=center width=95 nowrap 
onClick="return c('www.google.com/grphp','wg',event);" 
style=cursor:pointer;cursor:hand;><a id=2a class=q href="/grphp?
hl=en&tab=wg&ie=UTF-8&oe=UTF-8" onClick="return c
('www.google.com/grphp','wg',event);"><font size=-1>Groups</font></a></td><td 
width=15>&nbsp;</td><td id=3 bgcolor=#efefef align=center width=95 nowrap 
onClick="return c('www.google.com/dirhp','wd',event);" 
style=cursor:pointer;cursor:hand;><a id=3a class=q href="/dirhp?
hl=en&tab=wd&ie=UTF-8&oe=UTF-8" onClick="return c
('www.google.com/dirhp','wd',event);"><font size=-
1>Directory</font></a></td><td width=15>&nbsp;</td><td id=4 bgcolor=#efefef 
align=center width=95 nowrap onClick="return c
('www.google.com/nwshp','wn',event);" style=cursor:pointer;cursor:hand;><a 
id=4a class=q href="/nwshp?hl=en&tab=wn&ie=UTF-8&oe=UTF-8" onClick="return c
('www.google.com/nwshp','wn',event);"><font size=-1>News</font></a></td><td 
width=15>&nbsp;</td></tr><tr><td colspan=12 bgcolor=#3366cc><img width=1 
height=1 alt=""></td></tr></table><br><form action="/search" name=f><table 
cellspacing=0 cellpadding=0><tr><td width=75>&nbsp;</td><td align=center><input 
type=hidden name=hl value=en><span id=hf></span><input type=hidden name=ie 
value="UTF-8"><input type=hidden name=oe value="UTF-8"><input maxLength=256 
size=55 name=q value=""><br><input type=submit value="Google Search" 
name=btnG><input type=submit value="I'm Feeling Lucky" name=btnI></td><td 
valign=top nowrap><font size=-2>&nbsp;&#8226;&nbsp;<a href=/advanced_search?
hl=en>Advanced&nbsp;Search</a><br>&nbsp;&#8226;&nbsp;<a href=/preferences?
hl=en>Preferences</a><br>&nbsp;&#8226;&nbsp;<a href=/language_tools?
hl=en>Language Tools</a></font></td></tr></table></form><br>
<p><br><font size=-1><a href="/ads/">Advertise&nbsp;with&nbsp;Us</a> - <a 
href="/services/">Business&nbsp;Solutions</a> - <a 
href="/options/">Services&nbsp;&amp;&nbsp;Tools</a> - <a 
href=/about.html>Jobs,&nbsp;Press,&nbsp;&amp;&nbsp;Help</a><span id=hp 
style="behavior:url(#default#homepage)"></span>
<script>
//<!--
if (!hp.isHomePage('http://www.google.com/')) {document.write("<p><a 
href=\"/mgyhp.html\" onClick=\"style.behavior='url
(#default#homepage)';setHomePage('http://www.google.com/');\">Make Google Your 
Homepage!</a>");}
//-->
</script></font><p><font size=-2>&copy;2003 Google - Searching 3,083,324,652 
web pages</font></p></center></body></html>

Matched DUMP END ======================================================== 

java.lang.StackOverflowError
	at org.apache.regexp.StringCharacterIterator.isEnd
(StringCharacterIterator.java:96)
	at org.apache.regexp.RE.matchNodes(RE.java:1121)
	at org.apache.regexp.RE.matchNodes(RE.java:1376)
	at org.apache.regexp.RE.matchNodes(RE.java:910)
	at org.apache.regexp.RE.matchNodes(RE.java:1376)
	at org.apache.regexp.RE.matchNodes(RE.java:933)
	at org.apache.regexp.RE.matchNodes(RE.java:1376)
	at org.apache.regexp.RE.matchNodes(RE.java:910)
	at org.apache.regexp.RE.matchNodes(RE.java:1376)
	at org.apache.regexp.RE.matchNodes(RE.java:933)
	at org.apache.regexp.RE.matchNodes(RE.java:1376)
	at org.apache.regexp.RE.matchNodes(RE.java:910)
	at org.apache.regexp.RE.matchNodes(RE.java:1376)
	at org.apache.regexp.RE.matchNodes(RE.java:933)
	at org.apache.regexp.RE.matchNodes(RE.java:1376)
	at org.apache.regexp.RE.matchNodes(RE.java:910)
	at org.apache.regexp.RE.matchNodes(RE.java:1376)
	at org.apache.regexp.RE.matchNodes(RE.java:933)
	at org.apache.regexp.RE.matchNodes(RE.java:1376)
	at org.apache.regexp.RE.matchNodes(RE.java:910)
	at org.apache.regexp.RE.matchNodes(RE.java:1376)
	at org.apache.regexp.RE.matchNodes(RE.java:933)
	at org.apache.regexp.RE.matchNodes(RE.java:1376)
	at org.apache.regexp.RE.matchNodes(RE.java:910)
	at org.apache.regexp.RE.matchNodes(RE.java:1376)
	at org.apache.regexp.RE.matchNodes(RE.java:933)
	at org.apache.regexp.RE.matchNodes(RE.java:1376)
	at org.apache.regexp.RE.matchNodes(RE.java:910)
	at org.apache.regexp.RE.matchNodes(RE.java:1376)
	at org.apache.regexp.RE.matchNodes(RE.java:933)
	at org.apache.regexp.RE.matchNodes(RE.java:1376)
	at org.apache.regexp.RE.matchNodes(RE.java:910)
	at org.apache.regexp.RE.matchNodes(RE.java:1376)
	at org.apache.regexp.RE.matchNodes(RE.java:933)
	at org.apache.regexp.RE.matchNodes(RE.java:1376)
.... and so on.

---------------------------------------------------------------------
To unsubscribe, e-mail: regexp-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: regexp-dev-help@jakarta.apache.org