You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2006/01/29 20:03:43 UTC

[Nutch Wiki] Update of "AboutPlugins" by JakeVanderdray

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The following page has been changed by JakeVanderdray:
http://wiki.apache.org/nutch/AboutPlugins

New page:
Nutch's plugin system is based on the one used in Eclipse 2.x.  Plugins are central to how nutch works.  All of the parsing, indexing and searching that nutch does is actually accomplished by various plugins.

In writing a plugin, you're actually providing one or more ''extensions'' of the existing ''extension-points'' . The core Nutch ''extension-points'' are themselves defined in a plugin, the [http://lucene.apache.org/nutch/apidocs/org/apache/nutch/plugin/ExtensionPoint.html NutchExtensionPoints] plugin (they are listed in the !NutchExtensionPoints [http://svn.apache.org/viewcvs.cgi/lucene/nutch/trunk/src/plugin/nutch-extensionpoints/plugin.xml?view=markup plugin.xml] file). Each ''extension-point'' defines an interface that must be implemented by the ''extension''. The core extension points are:

 * [http://lucene.apache.org/nutch/apidocs/org/apache/nutch/clustering/OnlineClusterer.html OnlineClusterer] -- An extension point interface for online search results clustering algorithms (from javadoc).
 * [http://lucene.apache.org/nutch/apidocs/org/apache/nutch/indexer/IndexingFilter.html IndexingFilter] -- Permits one to add metadata to the indexed fields. All plugins found which implement this extension point are run sequentially on the parse (from javadoc).
 * [http://lucene.apache.org/nutch/apidocs/org/apache/nutch/ontology/Ontology.html Ontology]
 * [http://lucene.apache.org/nutch/apidocs/org/apache/nutch/parse/Parser.html Parser] -- Parser implementations read through fetched documents in order to extract data to be indexed.  This is what you need to implement if you want Nutch to be able to parse a new type of content, or extract more data from currently parseable content.
 * [http://lucene.apache.org/nutch/apidocs/org/apache/nutch/parse/HtmlParseFilter.html HtmlParseFilter] -- Permits one to add additional metadata to HTML parses (from javadoc).
 * [http://lucene.apache.org/nutch/apidocs/org/apache/nutch/protocol/Protocol.html Protocol] -- Protocol implementations allow nutch to use different protocols (ftp, http, etc.) to fetch documents.
 * [http://lucene.apache.org/nutch/apidocs/org/apache/nutch/searcher/QueryFilter.html QueryFilter] -- Extension point for query translation. Permits one to add metadata to a query (from javadoc).
 * [http://lucene.apache.org/nutch/apidocs/org/apache/nutch/net/URLFilter.html URLFilter] -- URLFilter implementations limit the URLs that nutch attempts to fetch.  The [http://lucene.apache.org/nutch/apidocs/org/apache/nutch/net/RegexURLFilter.html RegexURLFilter] distributed with Nutch provides a great deal of control over what URLs Nutch crawls, however if you have very complicated rules about what URLs you want to crawl, you can write your own implementation.
 * [http://svn.apache.org/viewcvs.cgi/lucene/nutch/trunk/src/java/org/apache/nutch/analysis/NutchAnalyzer.java?view=markup NutchAnalyzer] -- An extension point that provides some language specific analyzers (see MultiLingualSupport proposal). ''Since it is in development stage, it is not in released javadoc''.

== Source Files ==

You'll find the following inside of a plugin source directory:

 * A plugin.xml file that tells nutch about your plugin.
 * A build.xml file that tells ant how to build your plugin.
 * The source code of your plugin.

== Getting Nutch to Use a Plugin ==

In order to get Nutch to a given plugin, you need to edit your conf/nutch-site.xml file and add the name of the plugin to the list of plugin.includes.

<<< See also: WritingPluginExample

<<< See also: HowToContribute

<<< PluginCentral