You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2009/04/24 10:26:20 UTC
[Solr Wiki] Update of "ClusteringComponent" by StanislawOsinski
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.
The following page has been changed by StanislawOsinski:
http://wiki.apache.org/solr/ClusteringComponent
The comment on the change is:
Fine tuning of Carrot2 clustering section added
------------------------------------------------------------------------------
See also ClusteringFullResultsExample.
+ == Tuning Carrot2 clustering ==
+
+ The easiest way to tune Carrot2 clustering for your specific data is to use a dedicated Carrot2 tool called Document Clustering Workbench.
+
+ 1. [http://project.carrot2.org/download.html Download Carrot2 Document Clustering Workbench] for your platform.
+ 2. [http://download.carrot2.org/head/manual/#section.getting-started.solr Attach] your Solr instance as a document source in the Workbench.
+ 3. [http://download.carrot2.org/head/manual/#section.advanced-topics.fine-tuning Fine tune] stop words, stop labels and possibly [http://download.carrot2.org/head/manual/#section.component.lingo other attributes] of the clustering algorithms to suit your needs.
+ 4. To transfer the modified stopwords.* and stoplabels.* files to your Solr instance, simply make the modified files accessible in the classpath. If you're using the Solr example scripts, try:
+
+ {{{
+ java -cp <dir-with-your-modified-stopwords> -Dsolr.solr.home=./clustering/solr -jar start.jar
+ }}}
+
+
= Document Clustering =
<!> THIS IS NOT FULLY IMPLEMENTED YET.
@@ -196, +210 @@
<!> TODO <!> We likely also need a way of returning the status of all clustering tasks, that is if we support more than one task at a time.
-
See also Mahout: http://lucene.apache.org/mahout, which has several clustering algorithms implemented.