You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "david.stuart@progressivealliance.co.uk" <da...@progressivealliance.co.uk> on 2009/11/14 16:40:26 UTC
Plugin Help
Hi,
I am trying to write a plugin for nutch and am having real troubles getting it
registered in the system. I have created in src/plugin and added it to both the
build.xml in plugin and to nutch-site.xml now it seems to build ok but when I
try to run a basic crawl urls -dir crawl -depth 3 -topN 2 I see the plugin
registered in the hadoop.log
2009-11-14 14:57:45,739 INFO plugin.PluginRepository - Html Filter Parse
Plug-in (parse-htmlfilter)
But then get the error message below. I have followed all of the tutorials but
they are mostly for nutch 0.9 and have error in them which I have worked through
Thanks for your help
regards,
Dave
java.lang.RuntimeException: org.apache.nutch.plugin.PluginRuntimeException:
java.lang.ClassNotFoundException:
org.apache.nutch.parse.htmlfilter.HtmlfilterIndexer
at
org.apache.nutch.indexer.IndexingFilters.<init>(IndexingFilters.java:100)
at
org.apache.nutch.indexer.IndexerMapReduce.configure(IndexerMapReduce.java:61)
at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:338)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138)
Caused by: org.apache.nutch.plugin.PluginRuntimeException:
java.lang.ClassNotFoundException:
org.apache.nutch.parse.htmlfilter.HtmlfilterIndexer
at
org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:166)
at
org.apache.nutch.indexer.IndexingFilters.<init>(IndexingFilters.java:70)
... 8 more
Caused by: java.lang.ClassNotFoundException:
org.apache.nutch.parse.htmlfilter.HtmlfilterIndexer
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:319)
at java.lang.ClassLoader.loadClass(ClassLoader.java:254)
at
org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:156)
Re: Plugin Help
Posted by "david.stuart@progressivealliance.co.uk" <da...@progressivealliance.co.uk>.
Right I fixed the problem by specifically adding my plugin jar to the class
path. Is this right? Is there some place I can add it in the build scripts to do
this automatically?
Regards
On 14 November 2009 at 16:40 "david.stuart@progressivealliance.co.uk"
<da...@progressivealliance.co.uk> wrote:
> Hi,
>
> I am trying to write a plugin for nutch and am having real troubles getting it
> registered in the system. I have created in src/plugin and added it to both
> the
> build.xml in plugin and to nutch-site.xml now it seems to build ok but when I
> try to run a basic crawl urls -dir crawl -depth 3 -topN 2 I see the plugin
> registered in the hadoop.log
>
> 2009-11-14 14:57:45,739 INFO plugin.PluginRepository - Html Filter Parse
> Plug-in (parse-htmlfilter)
>
> But then get the error message below. I have followed all of the tutorials but
> they are mostly for nutch 0.9 and have error in them which I have worked
> through
>
> Thanks for your help
>
> regards,
> Dave
> java.lang.RuntimeException: org.apache.nutch.plugin.PluginRuntimeException:
> java.lang.ClassNotFoundException:
> org.apache.nutch.parse.htmlfilter.HtmlfilterIndexer
> at
> org.apache.nutch.indexer.IndexingFilters.<init>(IndexingFilters.java:100)
> at
> org.apache.nutch.indexer.IndexerMapReduce.configure(IndexerMapReduce.java:61)
> at
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
> at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)
> at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
> at
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
> at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:338)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138)
> Caused by: org.apache.nutch.plugin.PluginRuntimeException:
> java.lang.ClassNotFoundException:
> org.apache.nutch.parse.htmlfilter.HtmlfilterIndexer
> at
> org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:166)
> at
> org.apache.nutch.indexer.IndexingFilters.<init>(IndexingFilters.java:70)
> ... 8 more
> Caused by: java.lang.ClassNotFoundException:
> org.apache.nutch.parse.htmlfilter.HtmlfilterIndexer
> at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:319)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:254)
> at
> org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:156)
Re: Plugin Help
Posted by Dennis Kubes <ku...@apache.org>.
It depends on how you are building and your classpath. Lets call your
plugin myhtmlfilter. If running on a single server and you added it to
your src/plugin/build.xml under the deploy section, a myhtmlfilter
folder with the plugin should show up in under the build/plugins folder
upon build. Then you would just have to copy over that myhtmlfilter
folder to where your deployment plugins directory.
If running on a cluster, even in pseudo-distributed mode you would need
to copy over the nutch-*.job file. It has the plugins inside of it and
it gets distributed out to the cluster. If referencing from a webapp or
the nutch war file, you would need to copy to web-inf/classes/plugins.
Dennis
david.stuart@progressivealliance.co.uk wrote:
> Hi,
>
> I am trying to write a plugin for nutch and am having real troubles
> getting it registered in the system. I have created in src/plugin and
> added it to both the build.xml in plugin and to nutch-site.xml now it
> seems to build ok but when I try to run a basic crawl urls -dir crawl
> -depth 3 -topN 2 I see the plugin registered in the hadoop.log
>
> 2009-11-14 14:57:45,739 INFO plugin.PluginRepository - Html Filter
> Parse Plug-in (parse-htmlfilter)
>
> But then get the error message below. I have followed all of the
> tutorials but they are mostly for nutch 0.9 and have error in them which
> I have worked through
>
> Thanks for your help
>
> regards,
> Dave
> java.lang.RuntimeException:
> org.apache.nutch.plugin.PluginRuntimeException:
> java.lang.ClassNotFoundException:
> org.apache.nutch.parse.htmlfilter.HtmlfilterIndexer
> at
> org.apache.nutch.indexer.IndexingFilters.<init>(IndexingFilters.java:100)
> at
> org.apache.nutch.indexer.IndexerMapReduce.configure(IndexerMapReduce.java:61)
> at
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
> at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)
> at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
> at
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
> at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:338)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138)
> Caused by: org.apache.nutch.plugin.PluginRuntimeException:
> java.lang.ClassNotFoundException:
> org.apache.nutch.parse.htmlfilter.HtmlfilterIndexer
> at
> org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:166)
> at
> org.apache.nutch.indexer.IndexingFilters.<init>(IndexingFilters.java:70)
> ... 8 more
> Caused by: java.lang.ClassNotFoundException:
> org.apache.nutch.parse.htmlfilter.HtmlfilterIndexer
> at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:319)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:254)
> at
> org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:156)