You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mahout.apache.org by je...@apache.org on 2009/03/18 18:14:49 UTC

svn commit: r755656 - /lucene/mahout/trunk/examples/src/main/java/org/apache/mahout/clustering/dirichlet/README.txt

Author: jeastman
Date: Wed Mar 18 17:14:49 2009
New Revision: 755656

URL: http://svn.apache.org/viewvc?rev=755656&view=rev
Log:
Adding README.txt file to explain how to run the Dirichlet examples

Added:
    lucene/mahout/trunk/examples/src/main/java/org/apache/mahout/clustering/dirichlet/README.txt

Added: lucene/mahout/trunk/examples/src/main/java/org/apache/mahout/clustering/dirichlet/README.txt
URL: http://svn.apache.org/viewvc/lucene/mahout/trunk/examples/src/main/java/org/apache/mahout/clustering/dirichlet/README.txt?rev=755656&view=auto
==============================================================================
--- lucene/mahout/trunk/examples/src/main/java/org/apache/mahout/clustering/dirichlet/README.txt (added)
+++ lucene/mahout/trunk/examples/src/main/java/org/apache/mahout/clustering/dirichlet/README.txt Wed Mar 18 17:14:49 2009
@@ -0,0 +1,28 @@
+The following classes can be run without parameters to generate a sample data set and 
+run the reference Dirichlet Process Clustering implementation over them:
+
+DisplayDirichlet - generates 1000 samples from three, symmetric distributions. This is the same 
+    data set that is used by the following clustering programs. It displays the points on a screen
+    and superimposes the model parameters that were used to generate the points. You can edit the
+    generateSamples() method to change the sample points used by these programs.
+  * DisplayNDirichlet - clusters the above sample points using the NormalModelDistribution
+  * DisplaySNDirichlet - clusters the above sample points using the SampledNormalDistribution
+  * DisplayASNDirichlet - clusters the above sample points using the AsymmetricSampledNormalDistribution
+  * Display2dASNDirichlet - clusters a set of asymmetric sample points (generated by DisplayDirichlet's
+    generate2dSamples() method) using the AsymmetricSampledNormalDistribution.
+  * NOTE: each of these programs displays the sample points and then superimposes all of the clusters
+    from each iteration. The last iteration's clusters are in bold red and the previous several are 
+    colored (orange, yellow, green, blue, magenta) in order after which all earlier clusters are in
+    light grey. This helps to visualize how the clusters converge upon a solution over multiple
+    iterations.
+  * NOTE: by changing the UncommonDistributions.init(...) call in DisplayDirichlet, you can get
+    different behaviors. Removing the initialization altogether will use a random seed for each run.
+    
+DisplayOutputState - this program can be run after any of the SampledNormalDistribution M/R Dirichlet test 
+  cases in TestMapReduce. It draws the points and the resulting clusters from the output directory in 
+  a manner similar to the above. By changing the initialization seed in TestMapReduce you can get 
+  different data points.
+DisplayASNOutputState - similar to above but uses the AsymmetricSampledNormalDistribution.
+
+  
+    
\ No newline at end of file