You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Shirley Cohen <sc...@cs.utexas.edu> on 2009/01/04 18:44:56 UTC
problem running fetcher using hadoop jar nutch*.job command
Hi,
I'm new to nutch and am trying to run it on an existing hadoop 0.19.0
install. I'm using the command "hadoop jar
nutch-2008-12-02_04-01-57.job", as suggested by Dennis Kubes in an
earlier post. I've been able to crawl and generate segments successfully
using the following commands:
hadoop dfs -put dmoz dmoz
bin/hadoop jar nutch-2008-12-02_04-01-57.job
org.apache.nutch.crawl.Injector crawl/crawldb dmoz
bin/hadoop jar nutch-2008-12-02_04-01-57.job
org.apache.nutch.crawl.Generator crawl/crawldb crawl/segments
However, when I try to run the fetcher using the command:
bin/hadoop jar nutch-2008-12-02_04-01-57.job
org.apache.nutch.fetcher.Fetcher crawl/segments/20090104094558
I get the following error:
09/01/04 10:20:31 INFO fetcher.Fetcher: Fetcher: starting
09/01/04 10:20:31 INFO fetcher.Fetcher: Fetcher: segment:
crawl/segments/20090104094558
****calling init JobTracker*****
java.lang.NoSuchMethodError:
org.apache.nutch.fetcher.Fetcher$InputFormat.listPaths(Lorg/apache/hadoop/mapred/JobConf;)[Lorg/apache/hadoop/fs/Path;
at
org.apache.nutch.fetcher.Fetcher$InputFormat.getSplits(Fetcher.java:61)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:783)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1128)
at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:530)
at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:565)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:537)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
Note: The subdirectory "20090104094558" was created by the generator.
I'm running the 0.9 release of nutch downloaded from:
http://mirrors.24-7-solutions.net/pub/apache/lucene/nutch/
Does anyone know what is going on?
Thanks in advance,
Shirley
Re: problem running fetcher using hadoop jar nutch*.job command
Posted by Shirley Cohen <sc...@cs.utexas.edu>.
Yes, that's what I did and it worked :))
Thanks,
Shirley
Dennis Kubes wrote:
> It looks to me like you have a mismatch in the version of hadoop you
> are using with Nutch. Nutch trunk is on 0.19. You might want to try
> building from SVN and then retrying.
>
> Dennis
>
> Shirley Cohen wrote:
>> Hi,
>>
>> I found a workaround to this problem. I was able to run the fetcher
>> with the nutch*.job command using the latest working nighly build
>> from 12-28-2008.
>>
>> Shirley
>>
>> Shirley Cohen wrote:
>>> Hi,
>>>
>>> I'm new to nutch and am trying to run it on an existing hadoop
>>> 0.19.0 install. I'm using the command "hadoop jar
>>> nutch-2008-12-02_04-01-57.job", as suggested by Dennis Kubes in an
>>> earlier post. I've been able to crawl and generate segments
>>> successfully using the following commands:
>>>
>>> hadoop dfs -put dmoz dmoz
>>> bin/hadoop jar nutch-2008-12-02_04-01-57.job
>>> org.apache.nutch.crawl.Injector crawl/crawldb dmoz
>>> bin/hadoop jar nutch-2008-12-02_04-01-57.job
>>> org.apache.nutch.crawl.Generator crawl/crawldb crawl/segments
>>>
>>> However, when I try to run the fetcher using the command:
>>>
>>> bin/hadoop jar nutch-2008-12-02_04-01-57.job
>>> org.apache.nutch.fetcher.Fetcher crawl/segments/20090104094558
>>>
>>> I get the following error:
>>>
>>> 09/01/04 10:20:31 INFO fetcher.Fetcher: Fetcher: starting
>>> 09/01/04 10:20:31 INFO fetcher.Fetcher: Fetcher: segment:
>>> crawl/segments/20090104094558
>>> ****calling init JobTracker*****
>>> java.lang.NoSuchMethodError:
>>> org.apache.nutch.fetcher.Fetcher$InputFormat.listPaths(Lorg/apache/hadoop/mapred/JobConf;)[Lorg/apache/hadoop/fs/Path;
>>>
>>> at
>>> org.apache.nutch.fetcher.Fetcher$InputFormat.getSplits(Fetcher.java:61)
>>> at
>>> org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:783)
>>> at
>>> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1128)
>>> at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:530)
>>> at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:565)
>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>> at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:537)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown
>>> Source)
>>> at java.lang.reflect.Method.invoke(Unknown Source)
>>> at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
>>> at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>> at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
>>>
>>> Note: The subdirectory "20090104094558" was created by the generator.
>>>
>>> I'm running the 0.9 release of nutch downloaded from:
>>> http://mirrors.24-7-solutions.net/pub/apache/lucene/nutch/
>>>
>>> Does anyone know what is going on?
>>>
>>> Thanks in advance,
>>>
>>> Shirley
>>
>>
Re: problem running fetcher using hadoop jar nutch*.job command
Posted by Dennis Kubes <ku...@apache.org>.
It looks to me like you have a mismatch in the version of hadoop you are
using with Nutch. Nutch trunk is on 0.19. You might want to try
building from SVN and then retrying.
Dennis
Shirley Cohen wrote:
> Hi,
>
> I found a workaround to this problem. I was able to run the fetcher with
> the nutch*.job command using the latest working nighly build from
> 12-28-2008.
>
> Shirley
>
> Shirley Cohen wrote:
>> Hi,
>>
>> I'm new to nutch and am trying to run it on an existing hadoop 0.19.0
>> install. I'm using the command "hadoop jar
>> nutch-2008-12-02_04-01-57.job", as suggested by Dennis Kubes in an
>> earlier post. I've been able to crawl and generate segments
>> successfully using the following commands:
>>
>> hadoop dfs -put dmoz dmoz
>> bin/hadoop jar nutch-2008-12-02_04-01-57.job
>> org.apache.nutch.crawl.Injector crawl/crawldb dmoz
>> bin/hadoop jar nutch-2008-12-02_04-01-57.job
>> org.apache.nutch.crawl.Generator crawl/crawldb crawl/segments
>>
>> However, when I try to run the fetcher using the command:
>>
>> bin/hadoop jar nutch-2008-12-02_04-01-57.job
>> org.apache.nutch.fetcher.Fetcher crawl/segments/20090104094558
>>
>> I get the following error:
>>
>> 09/01/04 10:20:31 INFO fetcher.Fetcher: Fetcher: starting
>> 09/01/04 10:20:31 INFO fetcher.Fetcher: Fetcher: segment:
>> crawl/segments/20090104094558
>> ****calling init JobTracker*****
>> java.lang.NoSuchMethodError:
>> org.apache.nutch.fetcher.Fetcher$InputFormat.listPaths(Lorg/apache/hadoop/mapred/JobConf;)[Lorg/apache/hadoop/fs/Path;
>>
>> at
>> org.apache.nutch.fetcher.Fetcher$InputFormat.getSplits(Fetcher.java:61)
>> at
>> org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:783)
>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1128)
>> at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:530)
>> at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:565)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>> at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:537)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>> at java.lang.reflect.Method.invoke(Unknown Source)
>> at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
>> at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>> at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
>>
>> Note: The subdirectory "20090104094558" was created by the generator.
>>
>> I'm running the 0.9 release of nutch downloaded from:
>> http://mirrors.24-7-solutions.net/pub/apache/lucene/nutch/
>>
>> Does anyone know what is going on?
>>
>> Thanks in advance,
>>
>> Shirley
>
>
Re: problem running fetcher using hadoop jar nutch*.job command
Posted by Shirley Cohen <sc...@cs.utexas.edu>.
Hi,
I found a workaround to this problem. I was able to run the fetcher with
the nutch*.job command using the latest working nighly build from
12-28-2008.
Shirley
Shirley Cohen wrote:
> Hi,
>
> I'm new to nutch and am trying to run it on an existing hadoop 0.19.0
> install. I'm using the command "hadoop jar
> nutch-2008-12-02_04-01-57.job", as suggested by Dennis Kubes in an
> earlier post. I've been able to crawl and generate segments
> successfully using the following commands:
>
> hadoop dfs -put dmoz dmoz
> bin/hadoop jar nutch-2008-12-02_04-01-57.job
> org.apache.nutch.crawl.Injector crawl/crawldb dmoz
> bin/hadoop jar nutch-2008-12-02_04-01-57.job
> org.apache.nutch.crawl.Generator crawl/crawldb crawl/segments
>
> However, when I try to run the fetcher using the command:
>
> bin/hadoop jar nutch-2008-12-02_04-01-57.job
> org.apache.nutch.fetcher.Fetcher crawl/segments/20090104094558
>
> I get the following error:
>
> 09/01/04 10:20:31 INFO fetcher.Fetcher: Fetcher: starting
> 09/01/04 10:20:31 INFO fetcher.Fetcher: Fetcher: segment:
> crawl/segments/20090104094558
> ****calling init JobTracker*****
> java.lang.NoSuchMethodError:
> org.apache.nutch.fetcher.Fetcher$InputFormat.listPaths(Lorg/apache/hadoop/mapred/JobConf;)[Lorg/apache/hadoop/fs/Path;
>
> at
> org.apache.nutch.fetcher.Fetcher$InputFormat.getSplits(Fetcher.java:61)
> at
> org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:783)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1128)
> at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:530)
> at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:565)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:537)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
> at java.lang.reflect.Method.invoke(Unknown Source)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
> at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
>
> Note: The subdirectory "20090104094558" was created by the generator.
>
> I'm running the 0.9 release of nutch downloaded from:
> http://mirrors.24-7-solutions.net/pub/apache/lucene/nutch/
>
> Does anyone know what is going on?
>
> Thanks in advance,
>
> Shirley