You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Chris Anderson <jc...@grabb.it> on 2008/05/23 23:43:27 UTC

svn nutch with hadoop 0.17

Hey all,

We're experimenting with Nutch on a Hadoop cluster. Hadoop is version
0.17, launched using the Hadoop public EC2 AMI, using the instructions
here: http://wiki.apache.org/hadoop/AmazonEC2

When running Nutch, our method is to build a nutch.jar that leaves out
the Hadoop classes, based on the advice here:
http://www.mail-archive.com/nutch-user@lucene.apache.org/msg10225.html
- we're doing this by modifying build.xml (I can post our version if
it will help)

The one part of the advice we skipped is that we are running a
mismatch of versions - Nutch is currently against Hadoop 0.16, but we
are using Hadoop 0.17 for its clean EC2 support. Our version of Nutch
is the most recent svn trunk (r659263)

We're getting java.lang.AbstractMethodError on crawl - here's the
first error line:

java.lang.AbstractMethodError:
org.apache.nutch.crawl.Injector$InjectMapper.map(Ljava/lang/Object;Ljava/lang/Object;Lorg/apache/hadoop/mapred/OutputCollector;Lorg/apache/hadoop/mapred/Reporter;)V

And the full console output is here: http://pastie.caboo.se/202517

Question: is it worth pressing on with this version mismatch, or
should we fall back to Hadoop 0.16?

If Hadoop 0.17 support is on the Nutch roadmap, we're willing to help
close tickets / work to make this happen.

Thanks in advance for helping us find our bearings!

-- 
Chris Anderson
http://jchris.mfdz.com

Re: svn nutch with hadoop 0.17

Posted by Bradford Stephens <br...@gmail.com>.
Greetings,

I've actually tried to do something similar, and ran into some of the
same issues as you. If there's a plan to migrate to hadoop .17, I'll
chip in as well.

On Fri, May 23, 2008 at 2:43 PM, Chris Anderson <jc...@grabb.it> wrote:
> Hey all,
>
> We're experimenting with Nutch on a Hadoop cluster. Hadoop is version
> 0.17, launched using the Hadoop public EC2 AMI, using the instructions
> here: http://wiki.apache.org/hadoop/AmazonEC2
>
> When running Nutch, our method is to build a nutch.jar that leaves out
> the Hadoop classes, based on the advice here:
> http://www.mail-archive.com/nutch-user@lucene.apache.org/msg10225.html
> - we're doing this by modifying build.xml (I can post our version if
> it will help)
>
> The one part of the advice we skipped is that we are running a
> mismatch of versions - Nutch is currently against Hadoop 0.16, but we
> are using Hadoop 0.17 for its clean EC2 support. Our version of Nutch
> is the most recent svn trunk (r659263)
>
> We're getting java.lang.AbstractMethodError on crawl - here's the
> first error line:
>
> java.lang.AbstractMethodError:
> org.apache.nutch.crawl.Injector$InjectMapper.map(Ljava/lang/Object;Ljava/lang/Object;Lorg/apache/hadoop/mapred/OutputCollector;Lorg/apache/hadoop/mapred/Reporter;)V
>
> And the full console output is here: http://pastie.caboo.se/202517
>
> Question: is it worth pressing on with this version mismatch, or
> should we fall back to Hadoop 0.16?
>
> If Hadoop 0.17 support is on the Nutch roadmap, we're willing to help
> close tickets / work to make this happen.
>
> Thanks in advance for helping us find our bearings!
>
> --
> Chris Anderson
> http://jchris.mfdz.com
>

Re: svn nutch with hadoop 0.17

Posted by Chris Anderson <jc...@grabb.it>.
On Sat, May 24, 2008 at 10:52 AM, Andrzej Bialecki <ab...@getopt.org> wrote:
> Chris Anderson wrote:
>>
>> Andrzej,
>>
>> Thanks for the reply. We don't really need it. But Hadoop 0.17 seems
>> to be a big step forward, and if bringing Nutch to compatibility is
>> doable in a short code sprint, we might be those sprinters.
>>
>> The current plan is to attempt to build Nutch against Hadoop 0.17, and
>> then plow through the compiler errors one at a time until we get a
>> clean build. Is there something more structured we should be doing
>> instead?
>
> Not really, you need to also run the junit tests to make sure nothing broke
> (even though things may compile cleanly).

Thanks - glad to see there's the test suite.

>
>>
>> If anyone wants to join in on the project we could keep our work on
>> Github so everyone can contribute.
>
> If you have some spare CPU cycles to produce a clean svn diff patch against
> Nutch trunk/ we will certainly welcome your help ;) Please create a JIRA
> issue and attach a patch, I'm sure we will process it before the release.

Sure thing, I'll do that once we have a complete patch. Once we've got
it mostly working we'll throw it up somewhere so people can poke it
and help with whatever test cases are still failing.


>
>
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>



-- 
Chris Anderson
http://jchris.mfdz.com

Re: svn nutch with hadoop 0.17

Posted by Andrzej Bialecki <ab...@getopt.org>.
Chris Anderson wrote:
> Andrzej,
> 
> Thanks for the reply. We don't really need it. But Hadoop 0.17 seems
> to be a big step forward, and if bringing Nutch to compatibility is
> doable in a short code sprint, we might be those sprinters.
> 
> The current plan is to attempt to build Nutch against Hadoop 0.17, and
> then plow through the compiler errors one at a time until we get a
> clean build. Is there something more structured we should be doing
> instead?

Not really, you need to also run the junit tests to make sure nothing 
broke (even though things may compile cleanly).

> 
> If anyone wants to join in on the project we could keep our work on
> Github so everyone can contribute.

If you have some spare CPU cycles to produce a clean svn diff patch 
against Nutch trunk/ we will certainly welcome your help ;) Please 
create a JIRA issue and attach a patch, I'm sure we will process it 
before the release.


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Re: svn nutch with hadoop 0.17

Posted by Chris Anderson <jc...@grabb.it>.
Andrzej,

Thanks for the reply. We don't really need it. But Hadoop 0.17 seems
to be a big step forward, and if bringing Nutch to compatibility is
doable in a short code sprint, we might be those sprinters.

The current plan is to attempt to build Nutch against Hadoop 0.17, and
then plow through the compiler errors one at a time until we get a
clean build. Is there something more structured we should be doing
instead?

If anyone wants to join in on the project we could keep our work on
Github so everyone can contribute.

Chris

On Sat, May 24, 2008 at 10:32 AM, Andrzej Bialecki <ab...@getopt.org> wrote:
> Chris Anderson wrote:
>
>>
>> Question: is it worth pressing on with this version mismatch, or
>> should we fall back to Hadoop 0.16?
>
> If you need this now - yes, you should go back to Hadoop 0.16. These two
> versions are incompatible.
>
>> If Hadoop 0.17 support is on the Nutch roadmap, we're willing to help
>> close tickets / work to make this happen.
>
> Yes, the plan is to upgrade to the latest official Hadoop release before we
> release Nutch 1.0.
>
>
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>



-- 
Chris Anderson
http://jchris.mfdz.com

Re: svn nutch with hadoop 0.17

Posted by Andrzej Bialecki <ab...@getopt.org>.
Chris Anderson wrote:

> 
> Question: is it worth pressing on with this version mismatch, or
> should we fall back to Hadoop 0.16?

If you need this now - yes, you should go back to Hadoop 0.16. These two 
versions are incompatible.

> If Hadoop 0.17 support is on the Nutch roadmap, we're willing to help
> close tickets / work to make this happen.

Yes, the plan is to upgrade to the latest official Hadoop release before 
we release Nutch 1.0.


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com