You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mahout.apache.org by je...@apache.org on 2009/03/18 18:14:49 UTC
svn commit: r755656 -
/lucene/mahout/trunk/examples/src/main/java/org/apache/mahout/clustering/dirichlet/README.txt
Author: jeastman
Date: Wed Mar 18 17:14:49 2009
New Revision: 755656
URL: http://svn.apache.org/viewvc?rev=755656&view=rev
Log:
Adding README.txt file to explain how to run the Dirichlet examples
Added:
lucene/mahout/trunk/examples/src/main/java/org/apache/mahout/clustering/dirichlet/README.txt
Added: lucene/mahout/trunk/examples/src/main/java/org/apache/mahout/clustering/dirichlet/README.txt
URL: http://svn.apache.org/viewvc/lucene/mahout/trunk/examples/src/main/java/org/apache/mahout/clustering/dirichlet/README.txt?rev=755656&view=auto
==============================================================================
--- lucene/mahout/trunk/examples/src/main/java/org/apache/mahout/clustering/dirichlet/README.txt (added)
+++ lucene/mahout/trunk/examples/src/main/java/org/apache/mahout/clustering/dirichlet/README.txt Wed Mar 18 17:14:49 2009
@@ -0,0 +1,28 @@
+The following classes can be run without parameters to generate a sample data set and
+run the reference Dirichlet Process Clustering implementation over them:
+
+DisplayDirichlet - generates 1000 samples from three, symmetric distributions. This is the same
+ data set that is used by the following clustering programs. It displays the points on a screen
+ and superimposes the model parameters that were used to generate the points. You can edit the
+ generateSamples() method to change the sample points used by these programs.
+ * DisplayNDirichlet - clusters the above sample points using the NormalModelDistribution
+ * DisplaySNDirichlet - clusters the above sample points using the SampledNormalDistribution
+ * DisplayASNDirichlet - clusters the above sample points using the AsymmetricSampledNormalDistribution
+ * Display2dASNDirichlet - clusters a set of asymmetric sample points (generated by DisplayDirichlet's
+ generate2dSamples() method) using the AsymmetricSampledNormalDistribution.
+ * NOTE: each of these programs displays the sample points and then superimposes all of the clusters
+ from each iteration. The last iteration's clusters are in bold red and the previous several are
+ colored (orange, yellow, green, blue, magenta) in order after which all earlier clusters are in
+ light grey. This helps to visualize how the clusters converge upon a solution over multiple
+ iterations.
+ * NOTE: by changing the UncommonDistributions.init(...) call in DisplayDirichlet, you can get
+ different behaviors. Removing the initialization altogether will use a random seed for each run.
+
+DisplayOutputState - this program can be run after any of the SampledNormalDistribution M/R Dirichlet test
+ cases in TestMapReduce. It draws the points and the resulting clusters from the output directory in
+ a manner similar to the above. By changing the initialization seed in TestMapReduce you can get
+ different data points.
+DisplayASNOutputState - similar to above but uses the AsymmetricSampledNormalDistribution.
+
+
+
\ No newline at end of file