You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Shirley Cohen <sc...@cs.utexas.edu> on 2009/01/04 18:44:56 UTC

problem running fetcher using hadoop jar nutch*.job command

Hi,

I'm new to nutch and am trying to run it on an existing hadoop 0.19.0 
install. I'm using the command "hadoop jar 
nutch-2008-12-02_04-01-57.job", as suggested by Dennis Kubes in an 
earlier post. I've been able to crawl and generate segments successfully 
using the following commands:

hadoop dfs -put dmoz dmoz
bin/hadoop jar nutch-2008-12-02_04-01-57.job 
org.apache.nutch.crawl.Injector crawl/crawldb dmoz
bin/hadoop jar nutch-2008-12-02_04-01-57.job 
org.apache.nutch.crawl.Generator crawl/crawldb crawl/segments

However, when I try to run the fetcher using the command:

bin/hadoop jar nutch-2008-12-02_04-01-57.job 
org.apache.nutch.fetcher.Fetcher crawl/segments/20090104094558

I get the following error:

09/01/04 10:20:31 INFO fetcher.Fetcher: Fetcher: starting
09/01/04 10:20:31 INFO fetcher.Fetcher: Fetcher: segment: 
crawl/segments/20090104094558
****calling init JobTracker*****
java.lang.NoSuchMethodError: 
org.apache.nutch.fetcher.Fetcher$InputFormat.listPaths(Lorg/apache/hadoop/mapred/JobConf;)[Lorg/apache/hadoop/fs/Path;
        at 
org.apache.nutch.fetcher.Fetcher$InputFormat.getSplits(Fetcher.java:61)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:783)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1128)
        at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:530)
        at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:565)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:537)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
        at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)

Note: The subdirectory "20090104094558" was created by the generator.

I'm running the 0.9 release of nutch downloaded from: 
http://mirrors.24-7-solutions.net/pub/apache/lucene/nutch/

Does anyone know what is going on?

Thanks in advance,

Shirley


Re: problem running fetcher using hadoop jar nutch*.job command

Posted by Shirley Cohen <sc...@cs.utexas.edu>.
Yes, that's what I did and it worked :))

Thanks,

Shirley

Dennis Kubes wrote:
> It looks to me like you have a mismatch in the version of hadoop you 
> are using with Nutch.  Nutch trunk is on 0.19.  You might want to try 
> building from SVN and then retrying.
>
> Dennis
>
> Shirley Cohen wrote:
>> Hi,
>>
>> I found a workaround to this problem. I was able to run the fetcher 
>> with the nutch*.job command using the latest working nighly build 
>> from 12-28-2008.
>>
>> Shirley
>>
>> Shirley Cohen wrote:
>>> Hi,
>>>
>>> I'm new to nutch and am trying to run it on an existing hadoop 
>>> 0.19.0 install. I'm using the command "hadoop jar 
>>> nutch-2008-12-02_04-01-57.job", as suggested by Dennis Kubes in an 
>>> earlier post. I've been able to crawl and generate segments 
>>> successfully using the following commands:
>>>
>>> hadoop dfs -put dmoz dmoz
>>> bin/hadoop jar nutch-2008-12-02_04-01-57.job 
>>> org.apache.nutch.crawl.Injector crawl/crawldb dmoz
>>> bin/hadoop jar nutch-2008-12-02_04-01-57.job 
>>> org.apache.nutch.crawl.Generator crawl/crawldb crawl/segments
>>>
>>> However, when I try to run the fetcher using the command:
>>>
>>> bin/hadoop jar nutch-2008-12-02_04-01-57.job 
>>> org.apache.nutch.fetcher.Fetcher crawl/segments/20090104094558
>>>
>>> I get the following error:
>>>
>>> 09/01/04 10:20:31 INFO fetcher.Fetcher: Fetcher: starting
>>> 09/01/04 10:20:31 INFO fetcher.Fetcher: Fetcher: segment: 
>>> crawl/segments/20090104094558
>>> ****calling init JobTracker*****
>>> java.lang.NoSuchMethodError: 
>>> org.apache.nutch.fetcher.Fetcher$InputFormat.listPaths(Lorg/apache/hadoop/mapred/JobConf;)[Lorg/apache/hadoop/fs/Path; 
>>>
>>>        at 
>>> org.apache.nutch.fetcher.Fetcher$InputFormat.getSplits(Fetcher.java:61)
>>>        at 
>>> org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:783)
>>>        at 
>>> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1128)
>>>        at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:530)
>>>        at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:565)
>>>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>        at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:537)
>>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>>>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown 
>>> Source)
>>>        at java.lang.reflect.Method.invoke(Unknown Source)
>>>        at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
>>>        at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
>>>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>>        at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
>>>
>>> Note: The subdirectory "20090104094558" was created by the generator.
>>>
>>> I'm running the 0.9 release of nutch downloaded from: 
>>> http://mirrors.24-7-solutions.net/pub/apache/lucene/nutch/
>>>
>>> Does anyone know what is going on?
>>>
>>> Thanks in advance,
>>>
>>> Shirley
>>
>>



Re: problem running fetcher using hadoop jar nutch*.job command

Posted by Dennis Kubes <ku...@apache.org>.
It looks to me like you have a mismatch in the version of hadoop you are 
using with Nutch.  Nutch trunk is on 0.19.  You might want to try 
building from SVN and then retrying.

Dennis

Shirley Cohen wrote:
> Hi,
> 
> I found a workaround to this problem. I was able to run the fetcher with 
> the nutch*.job command using the latest working nighly build from 
> 12-28-2008.
> 
> Shirley
> 
> Shirley Cohen wrote:
>> Hi,
>>
>> I'm new to nutch and am trying to run it on an existing hadoop 0.19.0 
>> install. I'm using the command "hadoop jar 
>> nutch-2008-12-02_04-01-57.job", as suggested by Dennis Kubes in an 
>> earlier post. I've been able to crawl and generate segments 
>> successfully using the following commands:
>>
>> hadoop dfs -put dmoz dmoz
>> bin/hadoop jar nutch-2008-12-02_04-01-57.job 
>> org.apache.nutch.crawl.Injector crawl/crawldb dmoz
>> bin/hadoop jar nutch-2008-12-02_04-01-57.job 
>> org.apache.nutch.crawl.Generator crawl/crawldb crawl/segments
>>
>> However, when I try to run the fetcher using the command:
>>
>> bin/hadoop jar nutch-2008-12-02_04-01-57.job 
>> org.apache.nutch.fetcher.Fetcher crawl/segments/20090104094558
>>
>> I get the following error:
>>
>> 09/01/04 10:20:31 INFO fetcher.Fetcher: Fetcher: starting
>> 09/01/04 10:20:31 INFO fetcher.Fetcher: Fetcher: segment: 
>> crawl/segments/20090104094558
>> ****calling init JobTracker*****
>> java.lang.NoSuchMethodError: 
>> org.apache.nutch.fetcher.Fetcher$InputFormat.listPaths(Lorg/apache/hadoop/mapred/JobConf;)[Lorg/apache/hadoop/fs/Path; 
>>
>>        at 
>> org.apache.nutch.fetcher.Fetcher$InputFormat.getSplits(Fetcher.java:61)
>>        at 
>> org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:783)
>>        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1128)
>>        at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:530)
>>        at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:565)
>>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>        at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:537)
>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>>        at java.lang.reflect.Method.invoke(Unknown Source)
>>        at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
>>        at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
>>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>        at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
>>
>> Note: The subdirectory "20090104094558" was created by the generator.
>>
>> I'm running the 0.9 release of nutch downloaded from: 
>> http://mirrors.24-7-solutions.net/pub/apache/lucene/nutch/
>>
>> Does anyone know what is going on?
>>
>> Thanks in advance,
>>
>> Shirley
> 
> 

Re: problem running fetcher using hadoop jar nutch*.job command

Posted by Shirley Cohen <sc...@cs.utexas.edu>.
Hi,

I found a workaround to this problem. I was able to run the fetcher with 
the nutch*.job command using the latest working nighly build from 
12-28-2008.

Shirley

Shirley Cohen wrote:
> Hi,
>
> I'm new to nutch and am trying to run it on an existing hadoop 0.19.0 
> install. I'm using the command "hadoop jar 
> nutch-2008-12-02_04-01-57.job", as suggested by Dennis Kubes in an 
> earlier post. I've been able to crawl and generate segments 
> successfully using the following commands:
>
> hadoop dfs -put dmoz dmoz
> bin/hadoop jar nutch-2008-12-02_04-01-57.job 
> org.apache.nutch.crawl.Injector crawl/crawldb dmoz
> bin/hadoop jar nutch-2008-12-02_04-01-57.job 
> org.apache.nutch.crawl.Generator crawl/crawldb crawl/segments
>
> However, when I try to run the fetcher using the command:
>
> bin/hadoop jar nutch-2008-12-02_04-01-57.job 
> org.apache.nutch.fetcher.Fetcher crawl/segments/20090104094558
>
> I get the following error:
>
> 09/01/04 10:20:31 INFO fetcher.Fetcher: Fetcher: starting
> 09/01/04 10:20:31 INFO fetcher.Fetcher: Fetcher: segment: 
> crawl/segments/20090104094558
> ****calling init JobTracker*****
> java.lang.NoSuchMethodError: 
> org.apache.nutch.fetcher.Fetcher$InputFormat.listPaths(Lorg/apache/hadoop/mapred/JobConf;)[Lorg/apache/hadoop/fs/Path; 
>
>        at 
> org.apache.nutch.fetcher.Fetcher$InputFormat.getSplits(Fetcher.java:61)
>        at 
> org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:783)
>        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1128)
>        at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:530)
>        at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:565)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>        at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:537)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>        at java.lang.reflect.Method.invoke(Unknown Source)
>        at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
>        at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>        at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
>
> Note: The subdirectory "20090104094558" was created by the generator.
>
> I'm running the 0.9 release of nutch downloaded from: 
> http://mirrors.24-7-solutions.net/pub/apache/lucene/nutch/
>
> Does anyone know what is going on?
>
> Thanks in advance,
>
> Shirley