You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Emmanuel <jo...@gmail.com> on 2007/12/17 15:02:24 UTC

JSParser

I experienced some issue when i tried to parse few site with Javascript
inside.
I got the following errror:

java.lang.StackOverflowError
        at java.lang.Character.toUpperCase(Character.java:4278)
        at java.lang.String.regionMatches(String.java:1384)
        at java.lang.String.equalsIgnoreCase(String.java:1120)
        at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java
:138)
        at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java
:150)
        at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java
:150)
        at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java
:150)
        at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java
:150)
        at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java
:150)
        at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java
:150)
        at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java
:150)
        at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java
:150)
        at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java
:150)
        at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java
:150)
        at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java
:150)
        at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java
:150)
        at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java
:150)
        at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java
:150)
        at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java
:150)
        at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java
:150)
        at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java
:150)
        at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java
:150)
        at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java
:150)
        at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java
:150)

Does anybody had the same pb? any idea how i could solve it ?

Thanks

Re: Retrieving a Hit Object from a HitDetails Instance

Posted by Trey Spiva <tr...@spiva.com>.
Yes but I am not iterating over the Hits from the search.  I am using  
the Hits data
(actually an array of HitDetails that is created from the Hits) to  
pass to the OnlineClusterer to
create a array of HitClusters.  So what I am iterating over is the  
collection of HitsCluster
objects.  The HitsCluster only returns HitDetails.

On Jan 22, 2008, at 9:18 AM, Dennis Kubes wrote:

> A Hits object is returned from the search.  The Hits object has a  
> iterator method that returns a HitIterator that in turn returns Hit  
> objects.
>
> Dennis Kubes
>
> Trey Spiva wrote:
>> I am using the clustering plugin in Nutch.  The Clustering plugin  
>> returns the set of HitDetails that are grouped by clusters.  When  
>> using the anchors.jsp and cached.jsp you need to pass the indexNo  
>> and indexDocNo properties to the pages.  However, the properties  
>> only seem to be present on the Hit classes, and I can not find a  
>> way of retrieving a Hit instance from a HitDetails instance.
>> Does anyone have any ideas?
>> What have done is modified getDetails method in IndexSearcher so  
>> that the returned HitDetails has the index number and document  
>> index number as field values.  Does this sound like a good patch  
>> idea?
>> Thanks


Re: Retrieving a Hit Object from a HitDetails Instance

Posted by Dennis Kubes <ku...@apache.org>.
A Hits object is returned from the search.  The Hits object has a 
iterator method that returns a HitIterator that in turn returns Hit objects.

Dennis Kubes

Trey Spiva wrote:
> I am using the clustering plugin in Nutch.  The Clustering plugin 
> returns the set of HitDetails that are grouped by clusters.  When using 
> the anchors.jsp and cached.jsp you need to pass the indexNo and 
> indexDocNo properties to the pages.  However, the properties only seem 
> to be present on the Hit classes, and I can not find a way of retrieving 
> a Hit instance from a HitDetails instance.
> 
> Does anyone have any ideas?
> 
> What have done is modified getDetails method in IndexSearcher so that 
> the returned HitDetails has the index number and document index number 
> as field values.  Does this sound like a good patch idea?
> 
> Thanks

Retrieving a Hit Object from a HitDetails Instance

Posted by Trey Spiva <tr...@spiva.com>.
I am using the clustering plugin in Nutch.  The Clustering plugin  
returns the set of HitDetails that are grouped by clusters.  When  
using the anchors.jsp and cached.jsp you need to pass the indexNo and  
indexDocNo properties to the pages.  However, the properties only  
seem to be present on the Hit classes, and I can not find a way of  
retrieving a Hit instance from a HitDetails instance.

Does anyone have any ideas?

What have done is modified getDetails method in IndexSearcher so that  
the returned HitDetails has the index number and document index  
number as field values.  Does this sound like a good patch idea?

Thanks