You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by MilleBii <mi...@gmail.com> on 2011/03/07 10:27:57 UTC

Urgent:FetchedSegments.getSummary generates NullPointerException

Randomly I now seem to get this error in production where it was working
fine for more than a year....

java.lang.NullPointerException
>     at
> org.apache.nutch.searcher.FetchedSegments.getSummary(FetchedSegments.java:248)
>     at
> org.apache.nutch.searcher.FetchedSegments$SummaryTask.call(FetchedSegments.java:63)
>     at
> org.apache.nutch.searcher.FetchedSegments$SummaryTask.call(FetchedSegments.java:53)
>     at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>     at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>     at java.lang.Thread.run(Thread.java:636)
>

+ for some queries, the first hit pages are fine and suddently it stops and
I get a blank page, for some I get it on the query
+ I checked the query with Luke. Looked fine
+ the preceding bean call  in search.jsp (bean.search(query, start +
hitsToRetrieve, hitsPerSite, "site",  sort, reverse); did not generate any
exception as far as I can judge.

What can be the cause of that ? how to debug that one ?

I'm using Nutch1.0.
-- 
-MilleBii-

Re: Urgent:FetchedSegments.getSummary generates NullPointerException

Posted by MilleBii <mi...@gmail.com>.
Yes I found two corrupted segment, but not with Luke which did not give any
help on this one. Event the faulty segments could open nicely.
I loggued the HitDetails to find out which segments where creating the
error.

An improvement could be to catch the exception and log the segment id so
that it is found quickly.

Thx anyway.

2011/3/7 Andrzej Bialecki <ab...@getopt.org>

> On 3/7/11 10:27 AM, MilleBii wrote:
>
>> Randomly I now seem to get this error in production where it was working
>> fine for more than a year....
>>
>> java.lang.NullPointerException
>>
>>>     at
>>>
>>> org.apache.nutch.searcher.FetchedSegments.getSummary(FetchedSegments.java:248)
>>>     at
>>>
>>> org.apache.nutch.searcher.FetchedSegments$SummaryTask.call(FetchedSegments.java:63)
>>>     at
>>>
>>> org.apache.nutch.searcher.FetchedSegments$SummaryTask.call(FetchedSegments.java:53)
>>>     at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>>     at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>>     at
>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>>     at
>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>>     at java.lang.Thread.run(Thread.java:636)
>>>
>>>
>> + for some queries, the first hit pages are fine and suddently it stops
>> and
>> I get a blank page, for some I get it on the query
>> + I checked the query with Luke. Looked fine
>> + the preceding bean call  in search.jsp (bean.search(query, start +
>> hitsToRetrieve, hitsPerSite, "site",  sort, reverse); did not generate any
>> exception as far as I can judge.
>>
>> What can be the cause of that ? how to debug that one ?
>>
>> I'm using Nutch1.0.
>>
>
> One of your segments may be corrupt - usually this means it's either not
> fetched, or not parsed, or truly corrupt (or missing). The expected list of
> valid segments is the list of segment names that was used to produce the
> index  - segment names are recorded in Lucene indexes. You could open all
> indexes (e.g. with Luke) and see what are the top terms in the "segment"
> field.
>
>
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>


-- 
-MilleBii-

Re: Urgent:FetchedSegments.getSummary generates NullPointerException

Posted by Andrzej Bialecki <ab...@getopt.org>.
On 3/7/11 10:27 AM, MilleBii wrote:
> Randomly I now seem to get this error in production where it was working
> fine for more than a year....
>
> java.lang.NullPointerException
>>      at
>> org.apache.nutch.searcher.FetchedSegments.getSummary(FetchedSegments.java:248)
>>      at
>> org.apache.nutch.searcher.FetchedSegments$SummaryTask.call(FetchedSegments.java:63)
>>      at
>> org.apache.nutch.searcher.FetchedSegments$SummaryTask.call(FetchedSegments.java:53)
>>      at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>      at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>      at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>      at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>      at java.lang.Thread.run(Thread.java:636)
>>
>
> + for some queries, the first hit pages are fine and suddently it stops and
> I get a blank page, for some I get it on the query
> + I checked the query with Luke. Looked fine
> + the preceding bean call  in search.jsp (bean.search(query, start +
> hitsToRetrieve, hitsPerSite, "site",  sort, reverse); did not generate any
> exception as far as I can judge.
>
> What can be the cause of that ? how to debug that one ?
>
> I'm using Nutch1.0.

One of your segments may be corrupt - usually this means it's either not 
fetched, or not parsed, or truly corrupt (or missing). The expected list 
of valid segments is the list of segment names that was used to produce 
the index  - segment names are recorded in Lucene indexes. You could 
open all indexes (e.g. with Luke) and see what are the top terms in the 
"segment" field.


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com