You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2007/08/22 15:02:44 UTC
[Nutch Wiki] Update of "ClusteringPlugin" by DawidWeiss
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by DawidWeiss:
http://wiki.apache.org/nutch/ClusteringPlugin
The comment on the change is:
Updated the info about clustering plugin and instructions.
------------------------------------------------------------------------------
- -- Main.DawidWeiss - 01 Dec 2004
+ = Clustering Plugin =
- * plugin name: Online Search Results Clustering using Carrot2's Lingo component
+ plugin name:: Online Search Results Clustering using Carrot2 components
- * plugin version: 0.9.0
+ plugin version:: 1.0.3
+ == Plugin Info ==
- * provider: Dawid Weiss, The Carrot2 project
- * plugin home url: Included in Nutch CVS. Home WWW of the project: http://carrot2.sourceforge.net
- * plugin download url: A binary is included in Nutch CVS. The plugin builds together with Nutch.
- * license: BSD-style
- * short description: Search results clustering plugin.
+ * provider: The Carrot2 project, [http://www.carrot2.org]
+ * plugin home url: Plugin is included in Nutch codebase.
+ * plugin download url: Binaries included with Nutch.
+ * license: BSD-style
+ * short description: Plugin for clustering search results at query-time.
- * long description: A plugin that clusters search results into groups of (related, hopefully) documents.
+ * long description: This plugin organizes search results into groups of (related, hopefully) documents.
- * configureable parameters: Take a look at the defaults defined in nutch-default.xml (search for 'clustering').
+ * configureable parameters: Take a look at the defaults defined in nutch-default.xml (search for 'clustering').
- * meta data added to index: None. Clustering is performed dynamically for each result set.
+ * meta data added to index: None. Clustering is performed dynamically for each result set.
+ * required jars: The entire `lib` folder in the plugin must be present in classpath. More JARs might be needed from the Carrot2 project if additional algorithms or languages are to be used.
- * required jars: Many - the entire lib folder in the plugin must be present in classpath.
- * plugin extension points:
-
- * plugin extension point interface: net.nutch.clustering.OnlineClusterer
+ * plugin extension point interface: net.nutch.clustering.OnlineClusterer
- * plugin extension point xml snippet: ?
- = Installation guide
+ == Installation guide ==
- * Create some index using the instructions provided in Nutch documentation,
+ * Create a search index using the instructions provided in Nutch documentation.
- * Deploy Nutch Web application and make sure the index is found and works (type a query and see if you
+ * Deploy Nutch Web application and make sure the index is found and searching works (type a query and see if you get any results).
- get any results).
+ * Stop the web server (Tomcat, Jetty or anything you like).
+ * Modify `WEB-INF/classes/nutch-default.xml` file and include the clustering plugin (it is by default ignored) by adding `clustering-carrot2` to `plugin.includes` property.
+ * Restart your web server and reload the search page. You should see the `clustering` checkbox next to `search` button. Enable it and rerun your query. Cluster labels and documents should appear to the right of search results.
- * Stop Web container (Tomcat)
- * You must modify =WEB-INF/classes/nutch-default.xml= file and include the clustering plugin (it is by default
- ignored).
-
- plugin.includes
-
- protocol-http|parse-(text|html)|index-basic|query-(basic|site|url)|clustering-carrot2
- Regular expression naming plugin directory names to
-
- include. Any plugin not matching this expression is excluded. By
- default Nutch includes crawling just HTML and plain text via HTTP,
- and basic indexing and search plugins.
-
- * Restart Tomcat.
-
- * Reload the search page of Nutch. You should see the =clustering= checkbox next to =search= button.
- Enable it and rerun your query. Clustered results should appear to the right.
-