You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "mhammons (sent by Nabble.com)" <li...@nabble.com> on 2005/10/07 04:09:22 UTC

Noob Questions

Hi Gang,
   First let me say I've been very impressed so far.  This is fantastic.  I've gotten Nutch to crawl my intranet, pulling HTML, txt, MS Word, PDF, and MS Powerpoint.  Things look good so far.  

  Here's where I'm stuck.  There are two issues.  One, I need basic authentication.  I've got no problems hacking around so I was more than willing to have a look into what's in http-client so far and see what I needed to do to tweak it to work for me.  Two, in tweaking I'll need to recompile.  However I don't seem to be able to do that.

I've pulled down both 0.7.1 and nightly and get roughly the same results.  When just running ant from root I get the following:

...
compile:
     [echo] Compiling plugin: nutch-extensionpoints

BUILD FAILED
/mnt/capacity/hammons/nutch-nightly/build.xml:76: The following error occurred while executing this line:
/mnt/capacity/hammons/nutch-nightly/src/plugin/build.xml:18: The following error occurred while executing this line:
/mnt/capacity/hammons/nutch-nightly/src/plugin/build-plugin.xml:88: srcdir "/mnt/capacity/hammons/nutch-nightly/src/plugin/nutch-extensionpoints/src/java" does not exist!

Total time: 7 seconds

Since there doesn't really seem to be anything in nutch-extensionpoints I've commented it out in src/plugin/build.xml.  I am a little wary of that given that there seems to be an empty nutch-extensionpoints.jar file with the distribution.  However when I do this, compile, and then try to crawl I get the following error:
051006 205927 Plugins: looking in: /mnt/capacity/hammons/nutch-nightly/build/plugins
051006 205928 Missing dependency nutch-extensionpoints for plugin query-url
051006 205928 Missing dependency nutch-extensionpoints for plugin parse-mspowerpoint
051006 205928 Missing dependency nutch-extensionpoints for plugin query-site
051006 205928 Missing dependency nutch-extensionpoints for plugin parse-msword
051006 205928 Missing dependency nutch-extensionpoints for plugin protocol-httpclient
051006 205928 Missing dependency nutch-extensionpoints for plugin parse-html
051006 205928 Missing dependency nutch-extensionpoints for plugin parse-pdf
051006 205928 Missing dependency nutch-extensionpoints for plugin index-basic
051006 205928 Missing dependency nutch-extensionpoints for plugin parse-text
051006 205928 Missing dependency nutch-extensionpoints for plugin query-basic
051006 205928 Missing dependency nutch-extensionpoints for plugin urlfilter-regex
051006 205928 Plugin Auto-activation mode: [true]
051006 205928 Registered Plugins:
051006 205928   NONE
051006 205928 Registered Extension-Points:
051006 205928   NONE
Exception in thread "main" java.lang.ExceptionInInitializerError
        at org.apache.nutch.db.WebDBInjector.addPage(WebDBInjector.java:437)
        at org.apache.nutch.db.WebDBInjector.injectURLFile(WebDBInjector.java:378)
        at org.apache.nutch.db.WebDBInjector.main(WebDBInjector.java:535)
        at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:134)
Caused by: java.lang.RuntimeException: org.apache.nutch.net.URLFilter not found.
        at org.apache.nutch.net.URLFilters. (URLFilters.java:44)
        ... 4 more

Could someone comment on this?  I would certainly appreciate any pointers, as simple as they might be.

Regards,
Marc

--
Sent from the Nutch - Dev forum at Nabble.com:
http://www.nabble.com/Noob-Questions-t383479.html#a1058622

Re: Noob Questions

Posted by "mhammons (sent by Nabble.com)" <li...@nabble.com>.
w00t
okay... simple... add src/plugin/nutch-extensionpoints/src/java


--
Sent from the Nutch - Dev forum at Nabble.com:
http://www.nabble.com/Noob-Questions-t383479.html#a1058792