You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2007/08/22 15:02:44 UTC

[Nutch Wiki] Update of "ClusteringPlugin" by DawidWeiss

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The following page has been changed by DawidWeiss:
http://wiki.apache.org/nutch/ClusteringPlugin

The comment on the change is:
Updated the info about clustering plugin and instructions.

------------------------------------------------------------------------------
- -- Main.DawidWeiss - 01 Dec 2004
+ = Clustering Plugin =
  
- * plugin name: Online Search Results Clustering using Carrot2's Lingo component
+  plugin name:: Online Search Results Clustering using Carrot2 components
- * plugin version: 0.9.0
+  plugin version:: 1.0.3
  
+ == Plugin Info ==
- * provider: Dawid Weiss, The Carrot2 project
- * plugin home url: Included in Nutch CVS. Home WWW of the project: http://carrot2.sourceforge.net
- * plugin download url: A binary is included in Nutch CVS. The plugin builds together with Nutch.
- * license: BSD-style
  
- * short description: Search results clustering plugin.
+  * provider: The Carrot2 project, [http://www.carrot2.org]
+  * plugin home url: Plugin is included in Nutch codebase.
+  * plugin download url: Binaries included with Nutch.
+  * license: BSD-style
+  * short description: Plugin for clustering search results at query-time.
- * long description: A plugin that clusters search results into groups of (related, hopefully) documents.
+  * long description: This plugin organizes search results into groups of (related, hopefully) documents.
- * configureable parameters: Take a look at the defaults defined in nutch-default.xml (search for 'clustering').
+  * configureable parameters: Take a look at the defaults defined in nutch-default.xml (search for 'clustering').
- * meta data added to index: None. Clustering is performed dynamically for each result set.
+  * meta data added to index: None. Clustering is performed dynamically for each result set.
+  * required jars: The entire `lib` folder in the plugin must be present in classpath. More JARs might be needed from the Carrot2 project if additional  algorithms or languages are to be used.
- * required jars: Many - the entire lib folder in the plugin must be present in classpath.
- * plugin extension points:
- 
- * plugin extension point interface: net.nutch.clustering.OnlineClusterer
+  * plugin extension point interface: net.nutch.clustering.OnlineClusterer
- * plugin extension point xml snippet: ?
  
  
- = Installation guide
+ == Installation guide ==
  
- * Create some index using the instructions provided in Nutch documentation,
+  * Create a search index using the instructions provided in Nutch documentation.
- * Deploy Nutch Web application and make sure the index is found and works (type a query and see if you
+  * Deploy Nutch Web application and make sure the index is found and searching works (type a query and see if you get any results).
- get any results).
+  * Stop the web server (Tomcat, Jetty or anything you like).
+  * Modify `WEB-INF/classes/nutch-default.xml` file and include the clustering plugin (it is by default ignored) by adding `clustering-carrot2` to `plugin.includes` property.
+  * Restart your web server and reload the search page. You should see the `clustering` checkbox next to `search` button. Enable it and rerun your query. Cluster labels and documents should appear to the right of search results.
  
- * Stop Web container (Tomcat)
- * You must modify =WEB-INF/classes/nutch-default.xml= file and include the clustering plugin (it is by default
- ignored).
- 
- plugin.includes
- 
- protocol-http|parse-(text|html)|index-basic|query-(basic|site|url)|clustering-carrot2
- Regular expression naming plugin directory names to
- 
- include.  Any plugin not matching this expression is excluded.  By
- default Nutch includes crawling just HTML and plain text via HTTP,
- and basic indexing and search plugins.
- 
- * Restart Tomcat.
- 
- * Reload the search page of Nutch. You should see the =clustering= checkbox next to =search= button.
- Enable it and rerun your query. Clustered results should appear to the right.
-