You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Stefan Grroschupf (JIRA)" <ji...@apache.org> on 2005/02/20 23:17:53 UTC

[jira] Created: (NUTCH-4) Serious bug: OutOfMemoryError: Java heap space

Serious bug: OutOfMemoryError: Java heap space
----------------------------------------------

         Key: NUTCH-4
         URL: http://issues.apache.org/jira/browse/NUTCH-4
     Project: Nutch
        Type: Bug
    Reporter: Stefan Grroschupf


posted by: msashnikov
http://sourceforge.net/tracker/index.php?func=detail&aid=1110947&group_id=59548&atid=491356


Serious bug: OutOfMemoryError: Java heap space

Nutch 0.6 throws the following exception when the 
search phrase includes just a single quote. Something 
like "java or ja"va.

Here is the exception:

javax.servlet.ServletException: Java heap space
org.apache.jasper.runtime.PageContextImpl.doH
andlePageException(PageContextImpl.java:845)
org.apache.jasper.runtime.PageContextImpl.han
dlePageException(PageContextImpl.java:778)
org.apache.jsp.search_jsp._jspService
(org.apache.jsp.search_jsp:685)
org.apache.jasper.runtime.HttpJspBase.service
(HttpJspBase.java:99)
javax.servlet.http.HttpServlet.service
(HttpServlet.java:802)
org.apache.jasper.servlet.JspServletWrapper.se
rvice(JspServletWrapper.java:325)
org.apache.jasper.servlet.JspServlet.serviceJsp
File(JspServlet.java:295)
org.apache.jasper.servlet.JspServlet.service
(JspServlet.java:245)
javax.servlet.http.HttpServlet.service
(HttpServlet.java:802)


root cause 

java.lang.OutOfMemoryError: Java heap space


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira


[jira] Closed: (NUTCH-4) Serious bug: OutOfMemoryError: Java heap space

Posted by "Sami Siren (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/NUTCH-4?page=all ]
     
Sami Siren closed NUTCH-4:
--------------------------


> Serious bug: OutOfMemoryError: Java heap space
> ----------------------------------------------
>
>          Key: NUTCH-4
>          URL: http://issues.apache.org/jira/browse/NUTCH-4
>      Project: Nutch
>         Type: Bug
>     Reporter: Stefan Grroschupf
>     Assignee: Sami Siren
>  Attachments: query_parser_unbalanced_fix.tar.gz, query_parser_unbalanced_fix.tar.gz
>
> posted by: msashnikov
> http://sourceforge.net/tracker/index.php?func=detail&aid=1110947&group_id=59548&atid=491356
> Serious bug: OutOfMemoryError: Java heap space
> Nutch 0.6 throws the following exception when the 
> search phrase includes just a single quote. Something 
> like "java or ja"va.
> Here is the exception:
> javax.servlet.ServletException: Java heap space
> org.apache.jasper.runtime.PageContextImpl.doH
> andlePageException(PageContextImpl.java:845)
> org.apache.jasper.runtime.PageContextImpl.han
> dlePageException(PageContextImpl.java:778)
> org.apache.jsp.search_jsp._jspService
> (org.apache.jsp.search_jsp:685)
> org.apache.jasper.runtime.HttpJspBase.service
> (HttpJspBase.java:99)
> javax.servlet.http.HttpServlet.service
> (HttpServlet.java:802)
> org.apache.jasper.servlet.JspServletWrapper.se
> rvice(JspServletWrapper.java:325)
> org.apache.jasper.servlet.JspServlet.serviceJsp
> File(JspServlet.java:295)
> org.apache.jasper.servlet.JspServlet.service
> (JspServlet.java:245)
> javax.servlet.http.HttpServlet.service
> (HttpServlet.java:802)
> root cause 
> java.lang.OutOfMemoryError: Java heap space

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (NUTCH-4) Serious bug: OutOfMemoryError: Java heap space

Posted by "Stefan Grroschupf (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/NUTCH-4?page=comments#action_59465 ]
     
Stefan Grroschupf commented on NUTCH-4:
---------------------------------------

Date: 2005-01-27 12:09
Sender: msashnikov

I found that it happens because a bug in the following infinite 
loop in the NutchAnalysis class. 

label_4:
      while (true) {
        switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
        case 0:
        case PLUS:
        case MINUS:
        case COLON:
        case SLASH:
        case DOT:
        case ATSIGN:
        case APOSTROPHE:
        case WHITE:
          ;
          break;
        default:
          jj_la1[6] = jj_gen;
          break label_4;
        }
        nonTerm();
      }

It seems that this class was generated using JavaCC. Could 
anybody who knows how to use this tool fix the problem?

> Serious bug: OutOfMemoryError: Java heap space
> ----------------------------------------------
>
>          Key: NUTCH-4
>          URL: http://issues.apache.org/jira/browse/NUTCH-4
>      Project: Nutch
>         Type: Bug
>     Reporter: Stefan Grroschupf

>
> posted by: msashnikov
> http://sourceforge.net/tracker/index.php?func=detail&aid=1110947&group_id=59548&atid=491356
> Serious bug: OutOfMemoryError: Java heap space
> Nutch 0.6 throws the following exception when the 
> search phrase includes just a single quote. Something 
> like "java or ja"va.
> Here is the exception:
> javax.servlet.ServletException: Java heap space
> org.apache.jasper.runtime.PageContextImpl.doH
> andlePageException(PageContextImpl.java:845)
> org.apache.jasper.runtime.PageContextImpl.han
> dlePageException(PageContextImpl.java:778)
> org.apache.jsp.search_jsp._jspService
> (org.apache.jsp.search_jsp:685)
> org.apache.jasper.runtime.HttpJspBase.service
> (HttpJspBase.java:99)
> javax.servlet.http.HttpServlet.service
> (HttpServlet.java:802)
> org.apache.jasper.servlet.JspServletWrapper.se
> rvice(JspServletWrapper.java:325)
> org.apache.jasper.servlet.JspServlet.serviceJsp
> File(JspServlet.java:295)
> org.apache.jasper.servlet.JspServlet.service
> (JspServlet.java:245)
> javax.servlet.http.HttpServlet.service
> (HttpServlet.java:802)
> root cause 
> java.lang.OutOfMemoryError: Java heap space

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira


[jira] Resolved: (NUTCH-4) Serious bug: OutOfMemoryError: Java heap space

Posted by "Sami Siren (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/NUTCH-4?page=history ]
     
Sami Siren resolved NUTCH-4:
----------------------------

    Resolution: Fixed

just committed this, thanks Piotr

> Serious bug: OutOfMemoryError: Java heap space
> ----------------------------------------------
>
>          Key: NUTCH-4
>          URL: http://issues.apache.org/jira/browse/NUTCH-4
>      Project: Nutch
>         Type: Bug
>     Reporter: Stefan Grroschupf
>     Assignee: Sami Siren
>  Attachments: query_parser_unbalanced_fix.tar.gz, query_parser_unbalanced_fix.tar.gz
>
> posted by: msashnikov
> http://sourceforge.net/tracker/index.php?func=detail&aid=1110947&group_id=59548&atid=491356
> Serious bug: OutOfMemoryError: Java heap space
> Nutch 0.6 throws the following exception when the 
> search phrase includes just a single quote. Something 
> like "java or ja"va.
> Here is the exception:
> javax.servlet.ServletException: Java heap space
> org.apache.jasper.runtime.PageContextImpl.doH
> andlePageException(PageContextImpl.java:845)
> org.apache.jasper.runtime.PageContextImpl.han
> dlePageException(PageContextImpl.java:778)
> org.apache.jsp.search_jsp._jspService
> (org.apache.jsp.search_jsp:685)
> org.apache.jasper.runtime.HttpJspBase.service
> (HttpJspBase.java:99)
> javax.servlet.http.HttpServlet.service
> (HttpServlet.java:802)
> org.apache.jasper.servlet.JspServletWrapper.se
> rvice(JspServletWrapper.java:325)
> org.apache.jasper.servlet.JspServlet.serviceJsp
> File(JspServlet.java:295)
> org.apache.jasper.servlet.JspServlet.service
> (JspServlet.java:245)
> javax.servlet.http.HttpServlet.service
> (HttpServlet.java:802)
> root cause 
> java.lang.OutOfMemoryError: Java heap space

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira


Re: [jira] Commented: (NUTCH-4) Serious bug: OutOfMemoryError: Java heap space

Posted by Sami Siren <s....@sonera.inet.fi>.
ok, sorry about that - I must have missed your patch. please attach it 
to jira.

--
  Sami Siren


Piotr Kosiorowski (JIRA) wrote:
>      [ http://issues.apache.org/jira/browse/NUTCH-4?page=comments#action_62272 ]
>      
> Piotr Kosiorowski commented on NUTCH-4:
> ---------------------------------------
> 
> Some time ago I sent a patch for this with JUnit test on the list - but it was using old package naming.  I had plans to resubmit it last week but due to amount of other work I have posponed it. Anyway there is one difference between your and mine solution:
> I have added one additional method:
> void nonTermOrEOF() :
> {}
> {
>   nonTerm() | <EOF>
> }
> and changed:
> /** Parse anything but a term or an operator (plur or minus or quote). */
> void nonOpOrTerm() :
> {}
> {
>   (LOOKAHEAD(2) (<WHITE> | nonOpInfix() | ((<PLUS>|<MINUS>) nonTerm())))*
> }
> 
> into
> 
> /** Parse anything but a term or an operator (plur or minus or quote). */
> void nonOpOrTerm() :
> {}
> {
>   (LOOKAHEAD(2) (<WHITE> | nonOpInfix() | ((<PLUS>|<MINUS>) nonTermOrEOF())))*
> }
> 
> 
> My solution allows to parse:
> "test +" query as "test"
> in your case such query throws parse exception.
> 
> I can send the patch again if you wish so we can merge our versions. I will have a look at Junit tests too, to see if they can/should be merged.
> 
> 

[jira] Commented: (NUTCH-4) Serious bug: OutOfMemoryError: Java heap space

Posted by "Piotr Kosiorowski (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/NUTCH-4?page=comments#action_62272 ]
     
Piotr Kosiorowski commented on NUTCH-4:
---------------------------------------

Some time ago I sent a patch for this with JUnit test on the list - but it was using old package naming.  I had plans to resubmit it last week but due to amount of other work I have posponed it. Anyway there is one difference between your and mine solution:
I have added one additional method:
void nonTermOrEOF() :
{}
{
  nonTerm() | <EOF>
}
and changed:
/** Parse anything but a term or an operator (plur or minus or quote). */
void nonOpOrTerm() :
{}
{
  (LOOKAHEAD(2) (<WHITE> | nonOpInfix() | ((<PLUS>|<MINUS>) nonTerm())))*
}

into

/** Parse anything but a term or an operator (plur or minus or quote). */
void nonOpOrTerm() :
{}
{
  (LOOKAHEAD(2) (<WHITE> | nonOpInfix() | ((<PLUS>|<MINUS>) nonTermOrEOF())))*
}


My solution allows to parse:
"test +" query as "test"
in your case such query throws parse exception.

I can send the patch again if you wish so we can merge our versions. I will have a look at Junit tests too, to see if they can/should be merged.


> Serious bug: OutOfMemoryError: Java heap space
> ----------------------------------------------
>
>          Key: NUTCH-4
>          URL: http://issues.apache.org/jira/browse/NUTCH-4
>      Project: Nutch
>         Type: Bug
>     Reporter: Stefan Grroschupf
>     Assignee: Sami Siren
>  Attachments: query_parser_unbalanced_fix.tar.gz
>
> posted by: msashnikov
> http://sourceforge.net/tracker/index.php?func=detail&aid=1110947&group_id=59548&atid=491356
> Serious bug: OutOfMemoryError: Java heap space
> Nutch 0.6 throws the following exception when the 
> search phrase includes just a single quote. Something 
> like "java or ja"va.
> Here is the exception:
> javax.servlet.ServletException: Java heap space
> org.apache.jasper.runtime.PageContextImpl.doH
> andlePageException(PageContextImpl.java:845)
> org.apache.jasper.runtime.PageContextImpl.han
> dlePageException(PageContextImpl.java:778)
> org.apache.jsp.search_jsp._jspService
> (org.apache.jsp.search_jsp:685)
> org.apache.jasper.runtime.HttpJspBase.service
> (HttpJspBase.java:99)
> javax.servlet.http.HttpServlet.service
> (HttpServlet.java:802)
> org.apache.jasper.servlet.JspServletWrapper.se
> rvice(JspServletWrapper.java:325)
> org.apache.jasper.servlet.JspServlet.serviceJsp
> File(JspServlet.java:295)
> org.apache.jasper.servlet.JspServlet.service
> (JspServlet.java:245)
> javax.servlet.http.HttpServlet.service
> (HttpServlet.java:802)
> root cause 
> java.lang.OutOfMemoryError: Java heap space

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira


[jira] Assigned: (NUTCH-4) Serious bug: OutOfMemoryError: Java heap space

Posted by "Sami Siren (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/NUTCH-4?page=history ]

Sami Siren reassigned NUTCH-4:
------------------------------

    Assign To: Sami Siren

> Serious bug: OutOfMemoryError: Java heap space
> ----------------------------------------------
>
>          Key: NUTCH-4
>          URL: http://issues.apache.org/jira/browse/NUTCH-4
>      Project: Nutch
>         Type: Bug
>     Reporter: Stefan Grroschupf
>     Assignee: Sami Siren

>
> posted by: msashnikov
> http://sourceforge.net/tracker/index.php?func=detail&aid=1110947&group_id=59548&atid=491356
> Serious bug: OutOfMemoryError: Java heap space
> Nutch 0.6 throws the following exception when the 
> search phrase includes just a single quote. Something 
> like "java or ja"va.
> Here is the exception:
> javax.servlet.ServletException: Java heap space
> org.apache.jasper.runtime.PageContextImpl.doH
> andlePageException(PageContextImpl.java:845)
> org.apache.jasper.runtime.PageContextImpl.han
> dlePageException(PageContextImpl.java:778)
> org.apache.jsp.search_jsp._jspService
> (org.apache.jsp.search_jsp:685)
> org.apache.jasper.runtime.HttpJspBase.service
> (HttpJspBase.java:99)
> javax.servlet.http.HttpServlet.service
> (HttpServlet.java:802)
> org.apache.jasper.servlet.JspServletWrapper.se
> rvice(JspServletWrapper.java:325)
> org.apache.jasper.servlet.JspServlet.serviceJsp
> File(JspServlet.java:295)
> org.apache.jasper.servlet.JspServlet.service
> (JspServlet.java:245)
> javax.servlet.http.HttpServlet.service
> (HttpServlet.java:802)
> root cause 
> java.lang.OutOfMemoryError: Java heap space

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira


[jira] Updated: (NUTCH-4) Serious bug: OutOfMemoryError: Java heap space

Posted by "Sami Siren (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/NUTCH-4?page=history ]

Sami Siren updated NUTCH-4:
---------------------------

    Attachment: query_parser_unbalanced_fix.tar.gz

changed as described by Piotr Kosiorowski. pls follow up with the additional unit tests/comments

> Serious bug: OutOfMemoryError: Java heap space
> ----------------------------------------------
>
>          Key: NUTCH-4
>          URL: http://issues.apache.org/jira/browse/NUTCH-4
>      Project: Nutch
>         Type: Bug
>     Reporter: Stefan Grroschupf
>     Assignee: Sami Siren
>  Attachments: query_parser_unbalanced_fix.tar.gz, query_parser_unbalanced_fix.tar.gz
>
> posted by: msashnikov
> http://sourceforge.net/tracker/index.php?func=detail&aid=1110947&group_id=59548&atid=491356
> Serious bug: OutOfMemoryError: Java heap space
> Nutch 0.6 throws the following exception when the 
> search phrase includes just a single quote. Something 
> like "java or ja"va.
> Here is the exception:
> javax.servlet.ServletException: Java heap space
> org.apache.jasper.runtime.PageContextImpl.doH
> andlePageException(PageContextImpl.java:845)
> org.apache.jasper.runtime.PageContextImpl.han
> dlePageException(PageContextImpl.java:778)
> org.apache.jsp.search_jsp._jspService
> (org.apache.jsp.search_jsp:685)
> org.apache.jasper.runtime.HttpJspBase.service
> (HttpJspBase.java:99)
> javax.servlet.http.HttpServlet.service
> (HttpServlet.java:802)
> org.apache.jasper.servlet.JspServletWrapper.se
> rvice(JspServletWrapper.java:325)
> org.apache.jasper.servlet.JspServlet.serviceJsp
> File(JspServlet.java:295)
> org.apache.jasper.servlet.JspServlet.service
> (JspServlet.java:245)
> javax.servlet.http.HttpServlet.service
> (HttpServlet.java:802)
> root cause 
> java.lang.OutOfMemoryError: Java heap space

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira


[jira] Commented: (NUTCH-4) Serious bug: OutOfMemoryError: Java heap space

Posted by "Piotr Kosiorowski (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/NUTCH-4?page=comments#action_62296 ]
     
Piotr Kosiorowski commented on NUTCH-4:
---------------------------------------

Thanks for integrating my suggestion. After reviewing my tests I think two really minor things might be added to JUnit test:
 assertQueryEquals("\" abc def \" \" def ghi ","\"abc def\" \"def ghi\""); //missing fourth double quote
 assertQueryEquals("\"",""); //empty query

The rest works perfectly for me. 


> Serious bug: OutOfMemoryError: Java heap space
> ----------------------------------------------
>
>          Key: NUTCH-4
>          URL: http://issues.apache.org/jira/browse/NUTCH-4
>      Project: Nutch
>         Type: Bug
>     Reporter: Stefan Grroschupf
>     Assignee: Sami Siren
>  Attachments: query_parser_unbalanced_fix.tar.gz, query_parser_unbalanced_fix.tar.gz
>
> posted by: msashnikov
> http://sourceforge.net/tracker/index.php?func=detail&aid=1110947&group_id=59548&atid=491356
> Serious bug: OutOfMemoryError: Java heap space
> Nutch 0.6 throws the following exception when the 
> search phrase includes just a single quote. Something 
> like "java or ja"va.
> Here is the exception:
> javax.servlet.ServletException: Java heap space
> org.apache.jasper.runtime.PageContextImpl.doH
> andlePageException(PageContextImpl.java:845)
> org.apache.jasper.runtime.PageContextImpl.han
> dlePageException(PageContextImpl.java:778)
> org.apache.jsp.search_jsp._jspService
> (org.apache.jsp.search_jsp:685)
> org.apache.jasper.runtime.HttpJspBase.service
> (HttpJspBase.java:99)
> javax.servlet.http.HttpServlet.service
> (HttpServlet.java:802)
> org.apache.jasper.servlet.JspServletWrapper.se
> rvice(JspServletWrapper.java:325)
> org.apache.jasper.servlet.JspServlet.serviceJsp
> File(JspServlet.java:295)
> org.apache.jasper.servlet.JspServlet.service
> (JspServlet.java:245)
> javax.servlet.http.HttpServlet.service
> (HttpServlet.java:802)
> root cause 
> java.lang.OutOfMemoryError: Java heap space

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira


[jira] Commented: (NUTCH-4) Serious bug: OutOfMemoryError: Java heap space

Posted by "Stefan Grroschupf (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/NUTCH-4?page=comments#action_59466 ]
     
Stefan Grroschupf commented on NUTCH-4:
---------------------------------------

Date: 2005-01-27 11:07
Sender: msashnikov


Just wanted to add that when the search phrase includes 
just one quote symbol then Nutch takes 100% of CPU for a 
few seconds, eats all available RAM, and then throws this 
OutOfMemoryError exception.

> Serious bug: OutOfMemoryError: Java heap space
> ----------------------------------------------
>
>          Key: NUTCH-4
>          URL: http://issues.apache.org/jira/browse/NUTCH-4
>      Project: Nutch
>         Type: Bug
>     Reporter: Stefan Grroschupf

>
> posted by: msashnikov
> http://sourceforge.net/tracker/index.php?func=detail&aid=1110947&group_id=59548&atid=491356
> Serious bug: OutOfMemoryError: Java heap space
> Nutch 0.6 throws the following exception when the 
> search phrase includes just a single quote. Something 
> like "java or ja"va.
> Here is the exception:
> javax.servlet.ServletException: Java heap space
> org.apache.jasper.runtime.PageContextImpl.doH
> andlePageException(PageContextImpl.java:845)
> org.apache.jasper.runtime.PageContextImpl.han
> dlePageException(PageContextImpl.java:778)
> org.apache.jsp.search_jsp._jspService
> (org.apache.jsp.search_jsp:685)
> org.apache.jasper.runtime.HttpJspBase.service
> (HttpJspBase.java:99)
> javax.servlet.http.HttpServlet.service
> (HttpServlet.java:802)
> org.apache.jasper.servlet.JspServletWrapper.se
> rvice(JspServletWrapper.java:325)
> org.apache.jasper.servlet.JspServlet.serviceJsp
> File(JspServlet.java:295)
> org.apache.jasper.servlet.JspServlet.service
> (JspServlet.java:245)
> javax.servlet.http.HttpServlet.service
> (HttpServlet.java:802)
> root cause 
> java.lang.OutOfMemoryError: Java heap space

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira


Re: Highlighting query words in cached html

Posted by Jack Tang <hi...@gmail.com>.
Hi Ferenc

It is not tha answer to your question. But I have extract highlighting
query words in summaries. Here are what I have done:


---------- Configuration file(nutch-site.xml)------------------
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="nutch-conf.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<nutch-conf>

	<!-- terms highlight style -->
	<property>
	  <name>summary.fragment.highlight.exp</name>
	  <value>token</value>
	  <description>Default is '%token%'. The term name(expression) that 
	   will be used in highlight style.</description>
	</property>
	<property>
	  <name>summary.fragment.highlight.style</name>
	  <value>&lt;font color="red"&gt;token&lt;/font&gt;</value>
	  <description>Default is '&lt;b&gt;%token%&lt;/b&gt;'. The style of 
	  highlight in summary. </description>
	</property>

</nutch-conf>
 
-------- Change code in Summary.java-------------
toString() method in innner class Highlight.
- return "<b>" + super.toString() + "</b>";

+ NutchConf conf = NutchConf.get();
+ String termName = conf.get("summary.fragment.highlight.exp","%token%");
+ String style    =
conf.get("summary.fragment.highlight.style","<b>%token%</b>");
+  	
+ return style.replaceAll(termName,super.toString());

In future, you can write some JavaScript to turn on/off the highlight:)

Regards 
  
/Jack
 



On Apr 7, 2005 5:14 PM, yoursoft@freemail.hu <yo...@freemail.hu> wrote:
> Dear Guys,
> 
> I would like highlight searched words in cached html content (like as
> google).
> I have a problem with it:
> If query like eg.: window, and I have a javascript in the html with eg.
> window.open, and I change the content all words with "window", this will
> broke the cached content. I need only change  the 'text'  content of
> cached html.
> Can anyone to a idea how to make it?
> 
> Best Regards,
>    Ferenc
>

Highlighting query words in cached html

Posted by "yoursoft@freemail.hu" <yo...@freemail.hu>.
Dear Guys,

I would like highlight searched words in cached html content (like as 
google).
I have a problem with it:
If query like eg.: window, and I have a javascript in the html with eg. 
window.open, and I change the content all words with "window", this will 
broke the cached content. I need only change  the 'text'  content of 
cached html.
Can anyone to a idea how to make it?

Best Regards,
    Ferenc

[jira] Updated: (NUTCH-4) Serious bug: OutOfMemoryError: Java heap space

Posted by "Sami Siren (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/NUTCH-4?page=history ]

Sami Siren updated NUTCH-4:
---------------------------

    Attachment: query_parser_unbalanced_fix.tar.gz

Attached file contains a JunitTest for query parser and a fix proposal for the unbalanced quote bug.

to apply:
-copy the testcase into right directory
-apply the patch

to generate new parser:
ant generate-src

...and compile normally


> Serious bug: OutOfMemoryError: Java heap space
> ----------------------------------------------
>
>          Key: NUTCH-4
>          URL: http://issues.apache.org/jira/browse/NUTCH-4
>      Project: Nutch
>         Type: Bug
>     Reporter: Stefan Grroschupf
>     Assignee: Sami Siren
>  Attachments: query_parser_unbalanced_fix.tar.gz
>
> posted by: msashnikov
> http://sourceforge.net/tracker/index.php?func=detail&aid=1110947&group_id=59548&atid=491356
> Serious bug: OutOfMemoryError: Java heap space
> Nutch 0.6 throws the following exception when the 
> search phrase includes just a single quote. Something 
> like "java or ja"va.
> Here is the exception:
> javax.servlet.ServletException: Java heap space
> org.apache.jasper.runtime.PageContextImpl.doH
> andlePageException(PageContextImpl.java:845)
> org.apache.jasper.runtime.PageContextImpl.han
> dlePageException(PageContextImpl.java:778)
> org.apache.jsp.search_jsp._jspService
> (org.apache.jsp.search_jsp:685)
> org.apache.jasper.runtime.HttpJspBase.service
> (HttpJspBase.java:99)
> javax.servlet.http.HttpServlet.service
> (HttpServlet.java:802)
> org.apache.jasper.servlet.JspServletWrapper.se
> rvice(JspServletWrapper.java:325)
> org.apache.jasper.servlet.JspServlet.serviceJsp
> File(JspServlet.java:295)
> org.apache.jasper.servlet.JspServlet.service
> (JspServlet.java:245)
> javax.servlet.http.HttpServlet.service
> (HttpServlet.java:802)
> root cause 
> java.lang.OutOfMemoryError: Java heap space

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira