You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by David Weiser <da...@gmail.com> on 2008/07/11 06:59:16 UTC

Injector fails due to missing pluging

Summary:  running the 0.9 trunk after compilation and configuration results
in the injector job failing because plugins are not being found (that's what
i gather from the logs)

Settings: ubuntu running java 1.6.  nutch-site.xml, nutch-default.xml have
the full path to the src/plugin folder.

steps to reproduce:
1. get the latest trunk of 0.9.
2. edit the  "plugin.folders" property in {nutch-site.xml,
nutch-default.xml} to have the value "./src/plugin".
3. build with ant.
4. add "http://lucene.apache.org/nutch" in the urls/nutch file
5. edit the crawl-urlfilter.txt to allow searching of *.apache.org
6.  run the crawl.

here's a trace of my log:
crawl started in: crawl
rootUrlDir = urls
threads = 10
depth = 3
topN = 50
Injector: starting
Injector: crawlDb: crawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
org.apache.nutch.plugin.PluginRuntimeException:
java.lang.ClassNotFoundException:
org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer
    at
org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:166)
    at
org.apache.nutch.net.URLNormalizers.getURLNormalizers(URLNormalizers.java:170)
    at org.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:128)
    at
org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:57)
    at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:36)
    at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:204)
    at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:132)
Caused by: java.lang.ClassNotFoundException:
org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer
    at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
    at
org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:156)
    ... 10 more
org.apache.nutch.plugin.PluginRuntimeException:
java.lang.ClassNotFoundException:
org.apache.nutch.net.urlnormalizer.regex.RegexURLNormalizer
    at
org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:166)
    at
org.apache.nutch.net.URLNormalizers.getURLNormalizers(URLNormalizers.java:170)
    at org.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:128)
    at
org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:57)
    at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:36)
    at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:204)
    at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:132)
Caused by: java.lang.ClassNotFoundException:
org.apache.nutch.net.urlnormalizer.regex.RegexURLNormalizer
    at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
    at
org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:156)
    ... 10 more
org.apache.nutch.plugin.PluginRuntimeException:
java.lang.ClassNotFoundException:
org.apache.nutch.net.urlnormalizer.pass.PassURLNormalizer
    at
org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:166)
    at
org.apache.nutch.net.URLNormalizers.getURLNormalizers(URLNormalizers.java:170)
    at org.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:128)
    at
org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:57)
    at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:36)
    at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:204)
    at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:132)
Caused by: java.lang.ClassNotFoundException:
org.apache.nutch.net.urlnormalizer.pass.PassURLNormalizer
    at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
    at
org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:156)
    ... 10 more
Exception in thread "main" java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:894)
    at org.apache.nutch.crawl.Injector.inject(Injector.java:157)
    at org.apache.nutch.crawl.Crawl.main(Crawl.java:113)


give me some pointers please?

thanks,
dave