You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Emmanuel <jo...@gmail.com> on 2007/12/17 15:02:24 UTC
JSParser
I experienced some issue when i tried to parse few site with Javascript
inside.
I got the following errror:
java.lang.StackOverflowError
at java.lang.Character.toUpperCase(Character.java:4278)
at java.lang.String.regionMatches(String.java:1384)
at java.lang.String.equalsIgnoreCase(String.java:1120)
at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java
:138)
at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java
:150)
at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java
:150)
at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java
:150)
at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java
:150)
at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java
:150)
at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java
:150)
at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java
:150)
at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java
:150)
at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java
:150)
at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java
:150)
at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java
:150)
at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java
:150)
at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java
:150)
at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java
:150)
at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java
:150)
at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java
:150)
at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java
:150)
at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java
:150)
at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java
:150)
at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java
:150)
Does anybody had the same pb? any idea how i could solve it ?
Thanks
Re: Retrieving a Hit Object from a HitDetails Instance
Posted by Trey Spiva <tr...@spiva.com>.
Yes but I am not iterating over the Hits from the search. I am using
the Hits data
(actually an array of HitDetails that is created from the Hits) to
pass to the OnlineClusterer to
create a array of HitClusters. So what I am iterating over is the
collection of HitsCluster
objects. The HitsCluster only returns HitDetails.
On Jan 22, 2008, at 9:18 AM, Dennis Kubes wrote:
> A Hits object is returned from the search. The Hits object has a
> iterator method that returns a HitIterator that in turn returns Hit
> objects.
>
> Dennis Kubes
>
> Trey Spiva wrote:
>> I am using the clustering plugin in Nutch. The Clustering plugin
>> returns the set of HitDetails that are grouped by clusters. When
>> using the anchors.jsp and cached.jsp you need to pass the indexNo
>> and indexDocNo properties to the pages. However, the properties
>> only seem to be present on the Hit classes, and I can not find a
>> way of retrieving a Hit instance from a HitDetails instance.
>> Does anyone have any ideas?
>> What have done is modified getDetails method in IndexSearcher so
>> that the returned HitDetails has the index number and document
>> index number as field values. Does this sound like a good patch
>> idea?
>> Thanks
Re: Retrieving a Hit Object from a HitDetails Instance
Posted by Dennis Kubes <ku...@apache.org>.
A Hits object is returned from the search. The Hits object has a
iterator method that returns a HitIterator that in turn returns Hit objects.
Dennis Kubes
Trey Spiva wrote:
> I am using the clustering plugin in Nutch. The Clustering plugin
> returns the set of HitDetails that are grouped by clusters. When using
> the anchors.jsp and cached.jsp you need to pass the indexNo and
> indexDocNo properties to the pages. However, the properties only seem
> to be present on the Hit classes, and I can not find a way of retrieving
> a Hit instance from a HitDetails instance.
>
> Does anyone have any ideas?
>
> What have done is modified getDetails method in IndexSearcher so that
> the returned HitDetails has the index number and document index number
> as field values. Does this sound like a good patch idea?
>
> Thanks
Retrieving a Hit Object from a HitDetails Instance
Posted by Trey Spiva <tr...@spiva.com>.
I am using the clustering plugin in Nutch. The Clustering plugin
returns the set of HitDetails that are grouped by clusters. When
using the anchors.jsp and cached.jsp you need to pass the indexNo and
indexDocNo properties to the pages. However, the properties only
seem to be present on the Hit classes, and I can not find a way of
retrieving a Hit instance from a HitDetails instance.
Does anyone have any ideas?
What have done is modified getDetails method in IndexSearcher so that
the returned HitDetails has the index number and document index
number as field values. Does this sound like a good patch idea?
Thanks