You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Lauren Massa Lochridge <la...@ieee.org> on 2007/04/23 00:58:27 UTC

0.9 ClassCastException: org.apache.hadoop.io.Text

Hello,

Any opinions about a problem we have with 0.9 are appreciated.
The problem is that hits are found via command line NutchBean 
invocation, (in this small test case 333 hits) however, the result set 
is zero hits due to the exception. Luke also accesses these same indexes 
just fine.

Got the Hadoop patch that was referred to in the archives, because the 
description seemed applicable, however it appears to be the same version 
of hadoop-core: 12.2.2 that came with nutch 0.9. Is that patch already 
integrated into the most recent 0.9 nutch release or is it otherwise not 
applicable? Can someone tell me what the problem is given the exception 
in the log below?

Thanks.
Lauren Massa-Lochridge
eXlr8, Inc.

    $ bin/nutch org.apache.nutch.searcher.NutchBean news
    Total hits: 333
    Exception in thread "main" java.lang.RuntimeException:
    java.lang.ClassCastExcept
    ion: org.apache.hadoop.io.Text
            at
    org.apache.nutch.searcher.FetchedSegments.getSummary(FetchedSegments.
    java:204)
            at
    org.apache.nutch.searcher.NutchBean.getSummary(NutchBean.java:344)
            at org.apache.nutch.searcher.NutchBean.main(NutchBean.java:395)
    Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text
            at org.apache.hadoop.io.UTF8.compareTo(UTF8.java:123)
            at
    org.apache.hadoop.io.WritableComparator.compare(WritableComparator.ja
    va:107)
            at
    org.apache.hadoop.io.MapFile$Reader.binarySearch(MapFile.java:369)
            at org.apache.hadoop.io.MapFile$Reader.seek(MapFile.java:338)
            at org.apache.hadoop.io.MapFile$Reader.get(MapFile.java:392)
            at
    org.apache.hadoop.mapred.MapFileOutputFormat.getEntry(MapFileOutputFo
    rmat.java:86)
            at
    org.apache.nutch.searcher.FetchedSegments$Segment.getEntry(FetchedSeg
    ments.java:95)
            at
    org.apache.nutch.searcher.FetchedSegments$Segment.getParseText(Fetche
    dSegments.java:86)
            at
    org.apache.nutch.searcher.FetchedSegments.getSummary(FetchedSegments.
    java:159)
            at
    org.apache.nutch.searcher.FetchedSegments$SummaryThread.run(FetchedSe
    gments.java:177)






Re: 0.9 ClassCastException: org.apache.hadoop.io.Text

Posted by Lauren Massa Lochridge <la...@ieee.org>.
Ken,
Thanks very much - you were right. I'd never made the mistake before of 
copying in the newly created, ( 0.9 ), /crawl which resulted in adding 
to the existing 8.1 segments, rather than deleting all of the old 8.1 
and thereby replacing /crawl entirely; your response prompted me to look 
at that again and sure enough that's what it was!
Thanks.
Lauren Massa-Lochridge
eXlr8, Inc.

Ken Krugler wrote:

>> Any opinions about a problem we have with 0.9 are appreciated.
>> The problem is that hits are found via command line NutchBean 
>> invocation, (in this small test case 333 hits) however, the result 
>> set is zero hits due to the exception. Luke also accesses these same 
>> indexes just fine.
>>
>> Got the Hadoop patch that was referred to in the archives, because 
>> the description seemed applicable, however it appears to be the same 
>> version of hadoop-core: 12.2.2 that came with nutch 0.9. Is that 
>> patch already integrated into the most recent 0.9 nutch release or is 
>> it otherwise not applicable? Can someone tell me what the problem is 
>> given the exception in the log below?
>
>
> This looks similar to a problem I had when I was trying to use an 
> older crawl (one generated by a version of Nutch in between 0.8.1 and 
> 0.9) with the 0.9 distribution.
>
> E.g. if the page content was saved using an older version of Nutch, 
> then when the summarizer tries to load the content, you can run into 
> this exception.
>
> -- Ken
>
>
>> Thanks.
>> Lauren Lochridge
>> eXlr8, Inc.
>>
>>    $ bin/nutch org.apache.nutch.searcher.NutchBean news
>>    Total hits: 333
>>    Exception in thread "main" java.lang.RuntimeException:
>>    java.lang.ClassCastExcept
>>    ion: org.apache.hadoop.io.Text
>>            at
>>    org.apache.nutch.searcher.FetchedSegments.getSummary(FetchedSegments.
>>    java:204)
>>            at
>>    org.apache.nutch.searcher.NutchBean.getSummary(NutchBean.java:344)
>>            at 
>> org.apache.nutch.searcher.NutchBean.main(NutchBean.java:395)
>>    Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text
>>            at org.apache.hadoop.io.UTF8.compareTo(UTF8.java:123)
>>            at
>>    org.apache.hadoop.io.WritableComparator.compare(WritableComparator.ja
>>    va:107)
>>            at
>>    org.apache.hadoop.io.MapFile$Reader.binarySearch(MapFile.java:369)
>>            at org.apache.hadoop.io.MapFile$Reader.seek(MapFile.java:338)
>>            at org.apache.hadoop.io.MapFile$Reader.get(MapFile.java:392)
>>            at
>>    org.apache.hadoop.mapred.MapFileOutputFormat.getEntry(MapFileOutputFo
>>    rmat.java:86)
>>            at
>>    org.apache.nutch.searcher.FetchedSegments$Segment.getEntry(FetchedSeg
>>    ments.java:95)
>>            at
>>    org.apache.nutch.searcher.FetchedSegments$Segment.getParseText(Fetche
>>    dSegments.java:86)
>>            at
>>    org.apache.nutch.searcher.FetchedSegments.getSummary(FetchedSegments.
>>    java:159)
>>            at
>>    org.apache.nutch.searcher.FetchedSegments$SummaryThread.run(FetchedSe
>>    gments.java:177)
>
>
>


Re: 0.9 ClassCastException: org.apache.hadoop.io.Text

Posted by Ken Krugler <kk...@transpac.com>.
>Any opinions about a problem we have with 0.9 are appreciated.
>The problem is that hits are found via command line NutchBean 
>invocation, (in this small test case 333 hits) however, the result 
>set is zero hits due to the exception. Luke also accesses these same 
>indexes just fine.
>
>Got the Hadoop patch that was referred to in the archives, because 
>the description seemed applicable, however it appears to be the same 
>version of hadoop-core: 12.2.2 that came with nutch 0.9. Is that 
>patch already integrated into the most recent 0.9 nutch release or 
>is it otherwise not applicable? Can someone tell me what the problem 
>is given the exception in the log below?

This looks similar to a problem I had when I was trying to use an 
older crawl (one generated by a version of Nutch in between 0.8.1 and 
0.9) with the 0.9 distribution.

E.g. if the page content was saved using an older version of Nutch, 
then when the summarizer tries to load the content, you can run into 
this exception.

-- Ken


>Thanks.
>Lauren Massa-Lochridge
>eXlr8, Inc.
>
>    $ bin/nutch org.apache.nutch.searcher.NutchBean news
>    Total hits: 333
>    Exception in thread "main" java.lang.RuntimeException:
>    java.lang.ClassCastExcept
>    ion: org.apache.hadoop.io.Text
>            at
>    org.apache.nutch.searcher.FetchedSegments.getSummary(FetchedSegments.
>    java:204)
>            at
>    org.apache.nutch.searcher.NutchBean.getSummary(NutchBean.java:344)
>            at org.apache.nutch.searcher.NutchBean.main(NutchBean.java:395)
>    Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text
>            at org.apache.hadoop.io.UTF8.compareTo(UTF8.java:123)
>            at
>    org.apache.hadoop.io.WritableComparator.compare(WritableComparator.ja
>    va:107)
>            at
>    org.apache.hadoop.io.MapFile$Reader.binarySearch(MapFile.java:369)
>            at org.apache.hadoop.io.MapFile$Reader.seek(MapFile.java:338)
>            at org.apache.hadoop.io.MapFile$Reader.get(MapFile.java:392)
>            at
>    org.apache.hadoop.mapred.MapFileOutputFormat.getEntry(MapFileOutputFo
>    rmat.java:86)
>            at
>    org.apache.nutch.searcher.FetchedSegments$Segment.getEntry(FetchedSeg
>    ments.java:95)
>            at
>    org.apache.nutch.searcher.FetchedSegments$Segment.getParseText(Fetche
>    dSegments.java:86)
>            at
>    org.apache.nutch.searcher.FetchedSegments.getSummary(FetchedSegments.
>    java:159)
>            at
>    org.apache.nutch.searcher.FetchedSegments$SummaryThread.run(FetchedSe
>    gments.java:177)


-- 
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"Find Code, Find Answers"