You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Rajani Maski <ra...@gmail.com> on 2012/12/18 10:27:46 UTC
Run Nutch in Eclipse- Wiki documentation -Query step 1.4.3
Hi Team,
Initially I followed the steps mentioned in the nutch wiki
tutorial<http://wiki.apache.org/nutch/NutchTutorial>
to set up nutch from binary distribution. And it was successful undertook
crawling and indexing.
Now I am trying to set up nutch in eclipse and I am stuck at 1.4.3 step (
Link <http://wiki.apache.org/nutch/RunNutchInEclipse#Configure_Nutch>)
mentioned below
- 1. see the Tutorial and follow all configuration steps, ensure that
you DO NOT undertake any crawling. The directory structure for Nutch trunk
enables us to edit nutch-site.xml.template, nutch-default.xml and
regex-urlfilter.txt.template in our /conf directory, these properties will
then be automatically built into our /runtime build folder.
- 2. ensure that you change the property "plugin.folders" to
"./src/plugin" on $NUTCH_HOME/conf/nutch-site.xml.
This step 1 is pointing to the same tutorial that I followed in step one
when I used nutch in binary version. My doubt is whether I should use same
setup(if yes, where do I need to mention in eclipse nutch project that
nutch_home is at particular location) or should I follow the same steps and
configure it in eclipse work space //trunk folder?
I am getting job failed message, error java.lang.RuntimeException: Error
in configuring object
at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265)
at org.apache.nutch.crawl.Injector.inject(Injector.java:281)
at org.apache.nutch.crawl.Crawl.run(Crawl.java:127)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)
Regards
Rajani
Re: Run Nutch in Eclipse- Wiki documentation -Query step 1.4.3
Posted by Rajani Maski <ra...@gmail.com>.
Hi Lewis,
In the tutorial
<http://wiki.apache.org/nutch/RunNutchInEclipse#Configure_Nutch>there
is step which talks about configuring nutch : "*see the Tutorial and follow
all configuration
steps*"<http://wiki.apache.org/nutch/RunNutchInEclipse#Configure_Nutch>
Where this configuration need to be done? Is it in eclipse set up that will
have directory structure : trunk/conf enabling us to edit
nutch-site.xml.template, nutch-default.xml and
regex-urlfilter.txt.template?
And after the step to : Establish the Eclipse environment for
Nutch<http://wiki.apache.org/nutch/RunNutchInEclipse#Establish_the_Eclipse_environment_for_Nutch>,
I see that 2 jar files missing the reference and throws *error in import*.
in classes :
*org.apache.nutch.parse.html.TestDOMContentUtils*
import org.cyberneko.html.parsers.*;
*org.apache.nutch.parse.feedFeedParser*
import com.sun.syndication.feed.synd.SyndCategory;
import com.sun.syndication.feed.synd.SyndContent;
import com.sun.syndication.feed.synd.SyndEntry;
import com.sun.syndication.feed.synd.SyndFeed;
import com.sun.syndication.feed.synd.SyndPerson;
import com.sun.syndication.io.SyndFeedInput;
*Should we download and add them separately?*
If I remove the plugins and build the project then build is successful but
while running the application I get an error :
"error java.lang.RuntimeException: Error in configuring object"
at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265)
at org.apache.nutch.crawl.Injector.inject(Injector.java:281)
at org.apache.nutch.crawl.Crawl.run(Crawl.java:127)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)
Regards
Rajani
On Wed, Dec 19, 2012 at 7:00 PM, Lewis John Mcgibbney <
lewis.mcgibbney@gmail.com> wrote:
> Hi Rajani,
>
> I'm slightly confused here.
>
> Can you explain in a summary, what is actually wrong. Do you think there is
> something wrong with the wording of the tutorial?
>
> Lewis
>
> On Tue, Dec 18, 2012 at 9:27 AM, Rajani Maski <ra...@gmail.com>
> wrote:
>
> > Hi Team,
> >
> > Initially I followed the steps mentioned in the nutch wiki
> > tutorial<http://wiki.apache.org/nutch/NutchTutorial>
> > to set up nutch from binary distribution. And it was successful undertook
> > crawling and indexing.
> >
> >
> > Now I am trying to set up nutch in eclipse and I am stuck at 1.4.3 step
> (
> > Link <http://wiki.apache.org/nutch/RunNutchInEclipse#Configure_Nutch>)
> > mentioned below
> >
> > - 1. see the Tutorial and follow all configuration steps, ensure that
> > you DO NOT undertake any crawling. The directory structure for Nutch
> > trunk
> > enables us to edit nutch-site.xml.template, nutch-default.xml and
> > regex-urlfilter.txt.template in our /conf directory, these properties
> > will
> > then be automatically built into our /runtime build folder.
> > - 2. ensure that you change the property "plugin.folders" to
> > "./src/plugin" on $NUTCH_HOME/conf/nutch-site.xml.
> >
> >
> > This step 1 is pointing to the same tutorial that I followed in step one
> > when I used nutch in binary version. My doubt is whether I should use
> same
> > setup(if yes, where do I need to mention in eclipse nutch project that
> > nutch_home is at particular location) or should I follow the same steps
> and
> > configure it in eclipse work space //trunk folder?
> >
> > I am getting job failed message, error java.lang.RuntimeException:
> Error
> > in configuring object
> > at
> >
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
> > at
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
> > at
> >
> >
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
> > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)
> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
> > at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> > Exception in thread "main" java.io.IOException: Job failed!
> > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265)
> > at org.apache.nutch.crawl.Injector.inject(Injector.java:281)
> > at org.apache.nutch.crawl.Crawl.run(Crawl.java:127)
> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> > at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)
> >
> > Regards
> > Rajani
> >
>
>
>
> --
> *Lewis*
>
Re: Run Nutch in Eclipse- Wiki documentation -Query step 1.4.3
Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi Rajani,
I'm slightly confused here.
Can you explain in a summary, what is actually wrong. Do you think there is
something wrong with the wording of the tutorial?
Lewis
On Tue, Dec 18, 2012 at 9:27 AM, Rajani Maski <ra...@gmail.com> wrote:
> Hi Team,
>
> Initially I followed the steps mentioned in the nutch wiki
> tutorial<http://wiki.apache.org/nutch/NutchTutorial>
> to set up nutch from binary distribution. And it was successful undertook
> crawling and indexing.
>
>
> Now I am trying to set up nutch in eclipse and I am stuck at 1.4.3 step (
> Link <http://wiki.apache.org/nutch/RunNutchInEclipse#Configure_Nutch>)
> mentioned below
>
> - 1. see the Tutorial and follow all configuration steps, ensure that
> you DO NOT undertake any crawling. The directory structure for Nutch
> trunk
> enables us to edit nutch-site.xml.template, nutch-default.xml and
> regex-urlfilter.txt.template in our /conf directory, these properties
> will
> then be automatically built into our /runtime build folder.
> - 2. ensure that you change the property "plugin.folders" to
> "./src/plugin" on $NUTCH_HOME/conf/nutch-site.xml.
>
>
> This step 1 is pointing to the same tutorial that I followed in step one
> when I used nutch in binary version. My doubt is whether I should use same
> setup(if yes, where do I need to mention in eclipse nutch project that
> nutch_home is at particular location) or should I follow the same steps and
> configure it in eclipse work space //trunk folder?
>
> I am getting job failed message, error java.lang.RuntimeException: Error
> in configuring object
> at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
> at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
> at
>
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> Exception in thread "main" java.io.IOException: Job failed!
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265)
> at org.apache.nutch.crawl.Injector.inject(Injector.java:281)
> at org.apache.nutch.crawl.Crawl.run(Crawl.java:127)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)
>
> Regards
> Rajani
>
--
*Lewis*