You are viewing a plain text version of this content. The canonical link for it is here.
Posted to regexp-user@jakarta.apache.org by DECAFFMEYER MATHIEU <MA...@fortis.lu> on 2007/01/25 17:25:23 UTC

Error : at org.apache.regexp.RE.matchNodes(Unknown Source)

Hi,
my reguler expression is the following :

headlineRegex  ->  (&lt;h1&gt;)?(.*)&lt;/h1&gt;
group  ->  2

I am using the regular expression above to extract a headline (h1) from
an HTML document

    while (mHeadlineRE.match(content, offset)) {

For some Html pages this regular expression works,
but for some Html pages, it gives the following errors :

[...]
	at org.apache.regexp.RE.matchNodes(Unknown Source)
	at org.apache.regexp.RE.matchNodes(Unknown Source)
	at org.apache.regexp.RE.matchNodes(Unknown Source)
	at org.apache.regexp.RE.matchNodes(Unknown Source)
	at org.apache.regexp.RE.matchNodes(Unknown Source)
	at org.apache.regexp.RE.matchNodes(Unknown Source)
	at org.apache.regexp.RE.matchNodes(Unknown Source)
	at org.apache.regexp.RE.matchNodes(Unknown Source)
	at org.apache.regexp.RE.matchNodes(Unknown Source)
	at org.apache.regexp.RE.matchNodes(Unknown Source)
	at org.apache.regexp.RE.matchAt(Unknown Source)
	at org.apache.regexp.RE.match(Unknown Source)
	at org.apache.regexp.RE.match(Unknown Source)
	at
net.sf.regain.crawler.preparator.html.HtmlContentExtractor.extractHeadli
nes(HtmlContentExtractor.java:140)

on this line :
while (mHeadlineRE.match(content, offset)) {


when I use this regex : &lt;h1&gt;(.*)&lt;/h1&gt;     
group 1
I never have this error.

The problem is that I don't even know how to debug this error, that's
why I am asking for some help here.

Any help is very appreciated, thank u!


__________________________________

   Matthew




============================================
Internet communications are not secure and therefore Fortis Banque Luxembourg S.A. does not accept legal responsibility for the contents of this message. The information contained in this e-mail is confidential and may be legally privileged. It is intended solely for the addressee. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. Nothing in the message is capable or intended to create any legally binding obligations on either party and it is not intended to provide legal advice.
============================================


RE: Error : at org.apache.regexp.RE.matchNodes(Unknown Source)

Posted by DECAFFMEYER MATHIEU <MA...@fortis.lu>.
Thank u.

Anyway, 
this exception doesn't occur anymore since I use the greedy matching.

__________________________________
   Matt

    

-----Original Message-----
From: news [mailto:news@sea.gmane.org] On Behalf Of Vadim Gritsenko
Sent: Thursday, March 08, 2007 4:29 AM
To: regexp-user@jakarta.apache.org
Subject: Re: Error : at org.apache.regexp.RE.matchNodes(Unknown Source)

*****  This message comes from the Internet Network *****

DECAFFMEYER MATHIEU wrote:
> For some Html pages this regular expression works,
> but for some Html pages, it gives the following errors :
> 
> [...]
>         at org.apache.regexp.RE.matchNodes(Unknown Source)
>         at org.apache.regexp.RE.matchNodes(Unknown Source)
>         at org.apache.regexp.RE.matchNodes(Unknown Source)

Matthew,

You are hitting the stack size limit. Read more here:
   http://issues.apache.org/bugzilla/show_bug.cgi?id=764


Vadim


---------------------------------------------------------------------
To unsubscribe, e-mail: regexp-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: regexp-user-help@jakarta.apache.org



============================================
Internet communications are not secure and therefore Fortis Banque Luxembourg S.A. does not accept legal responsibility for the contents of this message. The information contained in this e-mail is confidential and may be legally privileged. It is intended solely for the addressee. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. Nothing in the message is capable or intended to create any legally binding obligations on either party and it is not intended to provide legal advice.
============================================


---------------------------------------------------------------------
To unsubscribe, e-mail: regexp-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: regexp-user-help@jakarta.apache.org


Re: Error : at org.apache.regexp.RE.matchNodes(Unknown Source)

Posted by Vadim Gritsenko <va...@reverycodes.com>.
DECAFFMEYER MATHIEU wrote:
> For some Html pages this regular expression works,
> but for some Html pages, it gives the following errors :
> 
> [...]
>         at org.apache.regexp.RE.matchNodes(Unknown Source)
>         at org.apache.regexp.RE.matchNodes(Unknown Source)
>         at org.apache.regexp.RE.matchNodes(Unknown Source)

Matthew,

You are hitting the stack size limit. Read more here:
   http://issues.apache.org/bugzilla/show_bug.cgi?id=764


Vadim


---------------------------------------------------------------------
To unsubscribe, e-mail: regexp-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: regexp-user-help@jakarta.apache.org