You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2014/07/28 12:35:22 UTC

[Nutch Wiki] Trivial Update of "AboutPlugins" by SebastianNagel

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The "AboutPlugins" page has been changed by SebastianNagel:
https://wiki.apache.org/nutch/AboutPlugins?action=diff&rev1=11&rev2=12

Comment:
fix URL path to Apidocs

  Nutch's plugin system is based on the one used in [[http://www.eclipse.org/articles/Article-Plug-in-architecture/plugin_architecture.html|Eclipse 2.x]].  Plugins are central to how Nutch works.  All of the parsing, indexing and searching that Nutch does is actually accomplished by various plugins.
  
- In writing a plugin, you're actually providing one or more ''extensions'' of the existing ''extension-points'' . The core Nutch ''extension-points'' are themselves defined in a plugin, the [[http://nutch.apache.org/apidocs-1.8/org/apache/nutch/plugin/ExtensionPoint.html|NutchExtensionPoints]] plugin (they are listed in the !NutchExtensionPoints [[http://svn.apache.org/viewcvs.cgi/nutch/trunk/src/plugin/nutch-extensionpoints/plugin.xml?view=markup|plugin.xml]] file). Each ''extension-point'' defines an interface that must be implemented by the ''extension''. The core extension points are:
+ In writing a plugin, you're actually providing one or more ''extensions'' of the existing ''extension-points'' . The core Nutch ''extension-points'' are themselves defined in a plugin, the [[http://nutch.apache.org/apidocs/apidocs-1.8/org/apache/nutch/plugin/ExtensionPoint.html|NutchExtensionPoints]] plugin (they are listed in the !NutchExtensionPoints [[http://svn.apache.org/viewcvs.cgi/nutch/trunk/src/plugin/nutch-extensionpoints/plugin.xml?view=markup|plugin.xml]] file). Each ''extension-point'' defines an interface that must be implemented by the ''extension''. The core extension points are:
  
-  * [[http://nutch.apache.org/apidocs-1.8/org/apache/nutch/indexer/IndexWriter.html|IndexWriter]] -- Writes crawled data to a specific indexing backends (Solr, ElasticSearch, a CVS file, etc.).
+  * [[http://nutch.apache.org/apidocs/apidocs-1.8/org/apache/nutch/indexer/IndexWriter.html|IndexWriter]] -- Writes crawled data to a specific indexing backends (Solr, ElasticSearch, a CVS file, etc.).
-  * [[http://nutch.apache.org/apidocs-1.8/org/apache/nutch/indexer/IndexingFilter.html|IndexingFilter]] -- Permits one to add metadata to the indexed fields. All plugins found which implement this extension point are run sequentially on the parse (from javadoc).
+  * [[http://nutch.apache.org/apidocs/apidocs-1.8/org/apache/nutch/indexer/IndexingFilter.html|IndexingFilter]] -- Permits one to add metadata to the indexed fields. All plugins found which implement this extension point are run sequentially on the parse (from javadoc).
-  * [[http://nutch.apache.org/apidocs-1.8/org/apache/nutch/parse/Parser.html|Parser]] -- Parser implementations read through fetched documents in order to extract data to be indexed.  This is what you need to implement if you want Nutch to be able to parse a new type of content, or extract more data from currently parseable content.
+  * [[http://nutch.apache.org/apidocs/apidocs-1.8/org/apache/nutch/parse/Parser.html|Parser]] -- Parser implementations read through fetched documents in order to extract data to be indexed.  This is what you need to implement if you want Nutch to be able to parse a new type of content, or extract more data from currently parseable content.
-  * [[http://nutch.apache.org/apidocs-1.8/org/apache/nutch/parse/HtmlParseFilter.html|HtmlParseFilter]] -- Permits one to add additional metadata to HTML parses (from javadoc).
+  * [[http://nutch.apache.org/apidocs/apidocs-1.8/org/apache/nutch/parse/HtmlParseFilter.html|HtmlParseFilter]] -- Permits one to add additional metadata to HTML parses (from javadoc).
-  * [[http://nutch.apache.org/apidocs-1.8/org/apache/nutch/protocol/Protocol.html|Protocol]] -- Protocol implementations allow Nutch to use different protocols (ftp, http, etc.) to fetch documents.
+  * [[http://nutch.apache.org/apidocs/apidocs-1.8/org/apache/nutch/protocol/Protocol.html|Protocol]] -- Protocol implementations allow Nutch to use different protocols (ftp, http, etc.) to fetch documents.
-  * [[http://nutch.apache.org/apidocs-1.8/org/apache/nutch/net/URLFilter.html|URLFilter]] -- URLFilter implementations limit the URLs that Nutch attempts to fetch.  The [[http://nutch.apache.org/apidocs-1.8/org/apache/nutch/net/RegexURLFilter.html|RegexURLFilter]] distributed with Nutch provides a great deal of control over what URLs Nutch crawls, however if you have very complicated rules about what URLs you want to crawl, you can write your own implementation.
+  * [[http://nutch.apache.org/apidocs/apidocs-1.8/org/apache/nutch/net/URLFilter.html|URLFilter]] -- URLFilter implementations limit the URLs that Nutch attempts to fetch.  The [[http://nutch.apache.org/apidocs/apidocs-1.8/org/apache/nutch/net/RegexURLFilter.html|RegexURLFilter]] distributed with Nutch provides a great deal of control over what URLs Nutch crawls, however if you have very complicated rules about what URLs you want to crawl, you can write your own implementation.
-  * [[http://nutch.apache.org/apidocs-1.8/org/apache/nutch/net/URLNormalizer.html|URLNormalizer]] -- Interface used to convert URLs to normal form and optionally perform substitutions.
+  * [[http://nutch.apache.org/apidocs/apidocs-1.8/org/apache/nutch/net/URLNormalizer.html|URLNormalizer]] -- Interface used to convert URLs to normal form and optionally perform substitutions.
-  * [[http://nutch.apache.org/apidocs-1.8/org/apache/nutch/scoring/ScoringFilter.html|ScoringFilter]] -- A contract defining behavior of scoring plugins. A scoring filter will manipulate scoring variables in CrawlDatum and in resulting search indexes. Filters can be chained in a specific order, to provide multi-stage scoring adjustments. 
+  * [[http://nutch.apache.org/apidocs/apidocs-1.8/org/apache/nutch/scoring/ScoringFilter.html|ScoringFilter]] -- A contract defining behavior of scoring plugins. A scoring filter will manipulate scoring variables in CrawlDatum and in resulting search indexes. Filters can be chained in a specific order, to provide multi-stage scoring adjustments. 
-  * [[http://nutch.apache.org/apidocs-1.8/org/apache/nutch/segment/SegmentMergeFilter.html|SegmentMergeFilter]] -- Interface used to filter segments during segment merge. It allows filtering on more sophisticated criteria than just URLs. In particular it allows filtering based on metadata collected while parsing page. 
+  * [[http://nutch.apache.org/apidocs/apidocs-1.8/org/apache/nutch/segment/SegmentMergeFilter.html|SegmentMergeFilter]] -- Interface used to filter segments during segment merge. It allows filtering on more sophisticated criteria than just URLs. In particular it allows filtering based on metadata collected while parsing page. 
  
  
- Updated to [[http://nutch.apache.org/apidocs-1.8/index.html | Nutch apidocs version 1.8]]
+ Updated to [[http://nutch.apache.org/apidocs/apidocs-1.8/index.html | Nutch apidocs version 1.8]]
  
  == Source Files ==