You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2006/03/03 23:46:29 UTC
[Nutch Wiki] Update of "PluginCentral" by JeromeCharron
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by JeromeCharron:
http://wiki.apache.org/nutch/PluginCentral
------------------------------------------------------------------------------
In order to get Nutch to use any of these plugins, you just need to edit your conf/nutch-site.xml file and add the name of the plugin to the list of plugin.includes.
- * index-basic - Adds url, content and anchor fields to the index.
+ * '''index-basic''' - Adds url, content and anchor fields to the index.
- * index-more - Adds date, content-length, contentType, primaryType and subtype fields to the index.
+ * '''index-more''' - Adds date, content-length, contentType, primaryType and subtype fields to the index.
- * languageidentifier - Adds a lang field to the index and allows you to query against it.
+ * '''languageidentifier''' - Adds a lang field to the index and allows you to query against it.
+ * '''microformats-reltag''' - Adds [http://www.microformats.org/wiki/Rel-Tag rel-tag] fields to the index and runs queries against them.
- * [wiki:OntologyPlugin ontology] - Helps refine queries based on owl files.
+ * '''[wiki:OntologyPlugin ontology]''' - Helps refine queries based on owl files.
- * parse-ext - A wrapper that invokes external command to do real parsing job.
+ * '''parse-ext''' - A wrapper that invokes external command to do real parsing job.
- * parse-html - Parses HTML documents
+ * '''parse-html''' - Parses HTML documents
- * parse-js - Parses Java``Script
+ * '''parse-js''' - Parses Java``Script
- * parse-mp3 - Parses MP3s
+ * '''parse-mp3''' - Parses MP3s
- * parse-msexcel - Parses MS Excel documents
+ * '''parse-msexcel''' - Parses MS Excel documents
- * parse-mspowerpoint - Parses MS Powerpoint documents
+ * '''parse-mspowerpoint''' - Parses MS Powerpoint documents
- * parse-msword - Parses MS Word documents
+ * '''parse-msword''' - Parses MS Word documents
- * parse-pdf - Parses PDFs
+ * '''parse-pdf''' - Parses PDFs
- * parse-rss - Parses RSS feeds
+ * '''parse-rss''' - Parses RSS feeds
- * parse-rtf - Parses RTF files
+ * '''parse-rtf''' - Parses RTF files
- * parse-swf - Parses Flash SWF files
+ * '''parse-swf''' - Parses Flash SWF files
- * parse-text - Parses text documents
+ * '''parse-text''' - Parses text documents
- * protocol-file - Retreives documents from the filesystem
+ * '''protocol-file''' - Retreives documents from the filesystem
- * protocol-ftp - Retreives documents through ftp
+ * '''protocol-ftp''' - Retreives documents through ftp
- * protocol-http - Retreives documents through http
+ * '''protocol-http''' - Retreives documents through http
- * protocol-httpclient - Retreives documents through http and https
+ * '''protocol-httpclient''' - Retreives documents through http and https
- * query-basic - Runs queries against content, url and anchor fields
+ * '''query-basic''' - Runs queries against content, url and anchor fields
- * query-more - Runs queries against date, content-length, contentType, primaryType and subType fields.
+ * '''query-more''' - Runs queries against date, content-length, contentType, primaryType and subType fields.
- * query-site - Runs queries against site field
+ * '''query-site''' - Runs queries against site field
- * query-url - Runs queries against url field.
+ * '''query-url''' - Runs queries against url field.
- * urlfilter-prefix
+ * '''urlfilter-prefix'''
- * urlfilter-regex
+ * '''urlfilter-regex'''
== Plugins You can Download ==