You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by "Ratnesh,V2Solutions India" <ra...@in.v2solutions.com> on 2007/04/20 08:09:50 UTC

Can anybody tell me how the Nutch-0.9 is different than nutch-0.8.1

Hi,
can anybody explain me what's new with nutch-0.9 than in nutch-0.8.1 since I
have used nutch-0.8.1 ,
I am keen to know how the nutch-0.9 is different from older version .

I will appreciate your promt reply regarding my query..............

Thanks
"Ratnesh,V2Solutions India"
-- 
View this message in context: http://www.nabble.com/Can-anybody-tell-me-how-the-Nutch-0.9-is-different-than-nutch-0.8.1-tf3611388.html#a10091975
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: 0.9 ClassCastException: org.apache.hadoop.io.Text

Posted by Lauren Massa Lochridge <la...@ieee.org>.

Ken,
Thanks very much - you were right. I'd never made the mistake before of 
copying in the newly created, ( 0.9 ), /crawl which resulted in adding 
to the existing 8.1 segments, rather than deleting all of the old 8.1 
and thereby replacing /crawl entirely; your response prompted me to look 
at that again and sure enough that's what it was!
Thanks.
Lauren Massa-Lochridge
eXlr8, Inc.

Ken Krugler wrote:

>> Any opinions about a problem we have with 0.9 are appreciated.
>> The problem is that hits are found via command line NutchBean 
>> invocation, (in this small test case 333 hits) however, the result 
>> set is zero hits due to the exception. Luke also accesses these same 
>> indexes just fine.
>>
>> Got the Hadoop patch that was referred to in the archives, because 
>> the description seemed applicable, however it appears to be the same 
>> version of hadoop-core: 12.2.2 that came with nutch 0.9. Is that 
>> patch already integrated into the most recent 0.9 nutch release or is 
>> it otherwise not applicable? Can someone tell me what the problem is 
>> given the exception in the log below?
>
>
> This looks similar to a problem I had when I was trying to use an 
> older crawl (one generated by a version of Nutch in between 0.8.1 and 
> 0.9) with the 0.9 distribution.
>
> E.g. if the page content was saved using an older version of Nutch, 
> then when the summarizer tries to load the content, you can run into 
> this exception.
>
> -- Ken
>
>
>> Thanks.
>> Lauren Lochridge
>> eXlr8, Inc.
>>
>>    $ bin/nutch org.apache.nutch.searcher.NutchBean news
>>    Total hits: 333
>>    Exception in thread "main" java.lang.RuntimeException:
>>    java.lang.ClassCastExcept
>>    ion: org.apache.hadoop.io.Text
>>            at
>>    org.apache.nutch.searcher.FetchedSegments.getSummary(FetchedSegments.
>>    java:204)
>>            at
>>    org.apache.nutch.searcher.NutchBean.getSummary(NutchBean.java:344)
>>            at 
>> org.apache.nutch.searcher.NutchBean.main(NutchBean.java:395)
>>    Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text
>>            at org.apache.hadoop.io.UTF8.compareTo(UTF8.java:123)
>>            at
>>    org.apache.hadoop.io.WritableComparator.compare(WritableComparator.ja
>>    va:107)
>>            at
>>    org.apache.hadoop.io.MapFile$Reader.binarySearch(MapFile.java:369)
>>            at org.apache.hadoop.io.MapFile$Reader.seek(MapFile.java:338)
>>            at org.apache.hadoop.io.MapFile$Reader.get(MapFile.java:392)
>>            at
>>    org.apache.hadoop.mapred.MapFileOutputFormat.getEntry(MapFileOutputFo
>>    rmat.java:86)
>>            at
>>    org.apache.nutch.searcher.FetchedSegments$Segment.getEntry(FetchedSeg
>>    ments.java:95)
>>            at
>>    org.apache.nutch.searcher.FetchedSegments$Segment.getParseText(Fetche
>>    dSegments.java:86)
>>            at
>>    org.apache.nutch.searcher.FetchedSegments.getSummary(FetchedSegments.
>>    java:159)
>>            at
>>    org.apache.nutch.searcher.FetchedSegments$SummaryThread.run(FetchedSe
>>    gments.java:177)
>
>
>

Re: 0.9 ClassCastException: org.apache.hadoop.io.Text

Posted by Ken Krugler <kk...@transpac.com>.

>Any opinions about a problem we have with 0.9 are appreciated.
>The problem is that hits are found via command line NutchBean 
>invocation, (in this small test case 333 hits) however, the result 
>set is zero hits due to the exception. Luke also accesses these same 
>indexes just fine.
>
>Got the Hadoop patch that was referred to in the archives, because 
>the description seemed applicable, however it appears to be the same 
>version of hadoop-core: 12.2.2 that came with nutch 0.9. Is that 
>patch already integrated into the most recent 0.9 nutch release or 
>is it otherwise not applicable? Can someone tell me what the problem 
>is given the exception in the log below?

This looks similar to a problem I had when I was trying to use an 
older crawl (one generated by a version of Nutch in between 0.8.1 and 
0.9) with the 0.9 distribution.

E.g. if the page content was saved using an older version of Nutch, 
then when the summarizer tries to load the content, you can run into 
this exception.

-- Ken


>Thanks.
>Lauren Massa-Lochridge
>eXlr8, Inc.
>
>    $ bin/nutch org.apache.nutch.searcher.NutchBean news
>    Total hits: 333
>    Exception in thread "main" java.lang.RuntimeException:
>    java.lang.ClassCastExcept
>    ion: org.apache.hadoop.io.Text
>            at
>    org.apache.nutch.searcher.FetchedSegments.getSummary(FetchedSegments.
>    java:204)
>            at
>    org.apache.nutch.searcher.NutchBean.getSummary(NutchBean.java:344)
>            at org.apache.nutch.searcher.NutchBean.main(NutchBean.java:395)
>    Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text
>            at org.apache.hadoop.io.UTF8.compareTo(UTF8.java:123)
>            at
>    org.apache.hadoop.io.WritableComparator.compare(WritableComparator.ja
>    va:107)
>            at
>    org.apache.hadoop.io.MapFile$Reader.binarySearch(MapFile.java:369)
>            at org.apache.hadoop.io.MapFile$Reader.seek(MapFile.java:338)
>            at org.apache.hadoop.io.MapFile$Reader.get(MapFile.java:392)
>            at
>    org.apache.hadoop.mapred.MapFileOutputFormat.getEntry(MapFileOutputFo
>    rmat.java:86)
>            at
>    org.apache.nutch.searcher.FetchedSegments$Segment.getEntry(FetchedSeg
>    ments.java:95)
>            at
>    org.apache.nutch.searcher.FetchedSegments$Segment.getParseText(Fetche
>    dSegments.java:86)
>            at
>    org.apache.nutch.searcher.FetchedSegments.getSummary(FetchedSegments.
>    java:159)
>            at
>    org.apache.nutch.searcher.FetchedSegments$SummaryThread.run(FetchedSe
>    gments.java:177)


-- 
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"Find Code, Find Answers"

0.9 ClassCastException: org.apache.hadoop.io.Text

Posted by Lauren Massa Lochridge <la...@ieee.org>.

Hello,

Any opinions about a problem we have with 0.9 are appreciated.
The problem is that hits are found via command line NutchBean 
invocation, (in this small test case 333 hits) however, the result set 
is zero hits due to the exception. Luke also accesses these same indexes 
just fine.

Got the Hadoop patch that was referred to in the archives, because the 
description seemed applicable, however it appears to be the same version 
of hadoop-core: 12.2.2 that came with nutch 0.9. Is that patch already 
integrated into the most recent 0.9 nutch release or is it otherwise not 
applicable? Can someone tell me what the problem is given the exception 
in the log below?

Thanks.
Lauren Massa-Lochridge
eXlr8, Inc.

    $ bin/nutch org.apache.nutch.searcher.NutchBean news
    Total hits: 333
    Exception in thread "main" java.lang.RuntimeException:
    java.lang.ClassCastExcept
    ion: org.apache.hadoop.io.Text
            at
    org.apache.nutch.searcher.FetchedSegments.getSummary(FetchedSegments.
    java:204)
            at
    org.apache.nutch.searcher.NutchBean.getSummary(NutchBean.java:344)
            at org.apache.nutch.searcher.NutchBean.main(NutchBean.java:395)
    Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text
            at org.apache.hadoop.io.UTF8.compareTo(UTF8.java:123)
            at
    org.apache.hadoop.io.WritableComparator.compare(WritableComparator.ja
    va:107)
            at
    org.apache.hadoop.io.MapFile$Reader.binarySearch(MapFile.java:369)
            at org.apache.hadoop.io.MapFile$Reader.seek(MapFile.java:338)
            at org.apache.hadoop.io.MapFile$Reader.get(MapFile.java:392)
            at
    org.apache.hadoop.mapred.MapFileOutputFormat.getEntry(MapFileOutputFo
    rmat.java:86)
            at
    org.apache.nutch.searcher.FetchedSegments$Segment.getEntry(FetchedSeg
    ments.java:95)
            at
    org.apache.nutch.searcher.FetchedSegments$Segment.getParseText(Fetche
    dSegments.java:86)
            at
    org.apache.nutch.searcher.FetchedSegments.getSummary(FetchedSegments.
    java:159)
            at
    org.apache.nutch.searcher.FetchedSegments$SummaryThread.run(FetchedSe
    gments.java:177)

Re: Can anybody tell me how the Nutch-0.9 is different than nutch-0.8.1

Posted by Sami Siren <ss...@gmail.com>.

Ratnesh,V2Solutions India wrote:
> Hi,
> can anybody explain me what's new with nutch-0.9 than in nutch-0.8.1 since I
> have used nutch-0.8.1 ,
> I am keen to know how the nutch-0.9 is different from older version .

I think the best place to study thechanges since 0.8.1 is jira:

http://issues.apache.org/jira/secure/BrowseProject.jspa?id=10680&subset=3

where most of the changes are listed.

--
 Sami Siren