You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2006/03/11 18:49:11 UTC
[Nutch Wiki] Update of "PluginCentral" by ThomasRichter
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by ThomasRichter:
http://wiki.apache.org/nutch/PluginCentral
The comment on the change is:
moved some plugins between 0.7 and 0.8 to correctly show what is in each branch
------------------------------------------------------------------------------
In order to get Nutch to use any of these plugins, you just need to edit your conf/nutch-site.xml file and add the name of the plugin to the list of plugin.includes.
+ * '''clustering-carrot2'''
+ * '''creativecommons'''
* '''index-basic''' - Adds url, content and anchor fields to the index.
* '''index-more''' - Adds date, content-length, contentType, primaryType and subtype fields to the index.
* '''languageidentifier''' - Adds a lang field to the index and allows you to query against it.
- * '''microformats-reltag''' - Adds [http://www.microformats.org/wiki/Rel-Tag rel-tag] fields to the index and runs queries against them.
* '''[wiki:OntologyPlugin ontology]''' - Helps refine queries based on owl files.
* '''parse-ext''' - A wrapper that invokes external command to do real parsing job.
* '''parse-html''' - Parses HTML documents
* '''parse-js''' - Parses Java``Script
* '''parse-mp3''' - Parses MP3s
- * '''parse-msexcel''' - Parses MS Excel documents
- * '''parse-mspowerpoint''' - Parses MS Powerpoint documents
* '''parse-msword''' - Parses MS Word documents
* '''parse-pdf''' - Parses PDFs
* '''parse-rss''' - Parses RSS feeds
* '''parse-rtf''' - Parses RTF files
- * '''parse-swf''' - Parses Flash SWF files
* '''parse-text''' - Parses text documents
* '''protocol-file''' - Retreives documents from the filesystem
* '''protocol-ftp''' - Retreives documents through ftp
@@ -45, +43 @@
* '''analysis-de'''
* '''analysis-fr'''
- * '''clustering-carrot2'''
- * '''creativecommons'''
* '''lib-commons-httpclient'''
* '''lib-http'''
* '''lib-jakarta-poi'''
@@ -54, +50 @@
* '''lib-lucene-analyzers'''
* '''lib-nekohtml'''
* '''lib-parsems'''
+ * '''parse-msexcel''' - Parses MS Excel documents
+ * '''parse-mspowerpoint''' - Parses MS Powerpoint documents
+ * '''parse-swf''' - Parses Flash SWF files
+ * '''microformats-reltag''' - Adds [http://www.microformats.org/wiki/Rel-Tag rel-tag] fields to the index and runs queries against them.
* '''parse-zip'''
== Plugins You can Download ==