You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2009/04/24 10:26:20 UTC

[Solr Wiki] Update of "ClusteringComponent" by StanislawOsinski

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The following page has been changed by StanislawOsinski:
http://wiki.apache.org/solr/ClusteringComponent

The comment on the change is:
Fine tuning of Carrot2 clustering section added

------------------------------------------------------------------------------
  
  See also ClusteringFullResultsExample.
  
+ == Tuning Carrot2 clustering ==
+ 
+ The easiest way to tune Carrot2 clustering for your specific data is to use a dedicated Carrot2 tool called Document Clustering Workbench.
+ 
+  1. [http://project.carrot2.org/download.html Download Carrot2 Document Clustering Workbench] for your platform.
+  2. [http://download.carrot2.org/head/manual/#section.getting-started.solr Attach] your Solr instance as a document source in the Workbench.
+  3. [http://download.carrot2.org/head/manual/#section.advanced-topics.fine-tuning Fine tune] stop words, stop labels and possibly [http://download.carrot2.org/head/manual/#section.component.lingo other attributes] of the clustering algorithms to suit your needs.
+  4. To transfer the modified stopwords.* and stoplabels.* files to your Solr instance, simply make the modified files accessible in the classpath. If you're using the Solr example scripts, try:
+ 
+ {{{
+ java -cp <dir-with-your-modified-stopwords> -Dsolr.solr.home=./clustering/solr -jar start.jar
+ }}}
+ 
+ 
  = Document Clustering =
  
  <!> THIS IS NOT FULLY IMPLEMENTED YET.
@@ -196, +210 @@

  
  <!> TODO <!> We likely also need a way of returning the status of all clustering tasks, that is if we support more than one task at a time.
  
- 
  See also Mahout: http://lucene.apache.org/mahout, which has several clustering algorithms implemented.