You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mahout.apache.org by ap...@apache.org on 2015/04/03 03:36:01 UTC

svn commit: r1670995 - /mahout/site/mahout_cms/trunk/content/users/clustering/cluster-dumper.mdtext

Author: apalumbo
Date: Fri Apr  3 01:36:01 2015
New Revision: 1670995

URL: http://svn.apache.org/r1670995
Log:
removed link to missing eclips info.  added CLI usage

Modified:
    mahout/site/mahout_cms/trunk/content/users/clustering/cluster-dumper.mdtext

Modified: mahout/site/mahout_cms/trunk/content/users/clustering/cluster-dumper.mdtext
URL: http://svn.apache.org/viewvc/mahout/site/mahout_cms/trunk/content/users/clustering/cluster-dumper.mdtext?rev=1670995&r1=1670994&r2=1670995&view=diff
==============================================================================
--- mahout/site/mahout_cms/trunk/content/users/clustering/cluster-dumper.mdtext (original)
+++ mahout/site/mahout_cms/trunk/content/users/clustering/cluster-dumper.mdtext Fri Apr  3 01:36:01 2015
@@ -1,7 +1,7 @@
 Title: Cluster Dumper
 
 <a name="ClusterDumper-Introduction"></a>
-# Cluster Dumper - Introduction
+## Cluster Dumper - Introduction
 
 Clustering tasks in Mahout will output data in the format of a SequenceFile
 (Text, Cluster) and the Text is a cluster identifier string. To analyze
@@ -9,39 +9,58 @@ this output we need to convert the seque
 format and this is achieved using the clusterdump utility.
 
 <a name="ClusterDumper-Stepsforanalyzingclusteroutputusingclusterdumputility"></a>
-# Steps for analyzing cluster output using clusterdump utility
+## Steps for analyzing cluster output using clusterdump utility
 
 After you've executed a clustering tasks (either examples or real-world),
-you can run clusterdumper in 2 modes.
+you can run clusterdumper in 2 modes:
+
+
+1. Hadoop Environment
+1. Standalone Java Program 
 
-1. [Hadoop Environment](#hadoopenvironment.html)
-1. [Standalone Java Program ](#standalonejavaprogram.html)
 
 <a name="ClusterDumper-HadoopEnvironment{anchor:HadoopEnvironment}"></a>
 ### Hadoop Environment
 
 If you have setup your HADOOP_HOME environment variable, you can use the
-command line utility "mahout" to execute the ClusterDumper on Hadoop. In
+command line utility `mahout` to execute the ClusterDumper on Hadoop. In
 this case we wont need to get the output clusters to our local machines.
 The utility will read the output clusters present in HDFS and output the
 human-readable cluster values into our local file system. Say you've just
 executed the [synthetic control example ](clustering-of-synthetic-control-data.html)
- and want to analyze the output, you can execute
-
-    
-### Standalone Java Program {anchor:StandaloneJavaProgram}
-    
-ClusterDumper can be run using CLI. If your HADOOP_HOME environment
-variable is not set, you can execute ClusterDumper using "mahout" command
-line utility.
+ and want to analyze the output, you can execute the `mahout clusterdumper` utility from the command line.
 
-Get the output data from hadoop into your local machine. For example, in
-the case where you've executed a clustering example use
+#### CLI options:
+    --help                               Print out help	
+    --input (-i) input                   The directory containing Sequence
+                                           Files for the Clusters	    
+    --output (-o) output                 The output file.  If not specified,
+                                           dumps to the console.
+    --outputFormat (-of) outputFormat    The optional output format to write
+                                           the results as. Options: TEXT, CSV, or GRAPH_ML		 
+    --substring (-b) substring           The number of chars of the	    
+    					   asFormatString() to print	
+    --pointsDir (-p) pointsDir           The directory containing points  
+                                           sequence files mapping input vectors
+                                           to their cluster.  If specified, 
+                                           then the program will output the 
+                                           points associated with a cluster 
+    --dictionary (-d) dictionary         The dictionary file.
+    --dictionaryType (-dt) dictionaryType    The dictionary file type	    
+                                         (text|sequencefile)
+    --distanceMeasure (-dm) distanceMeasure  The classname of the DistanceMeasure.
+                                               Default is SquaredEuclidean.
+    --numWords (-n) numWords             The number of top terms to print 
+    --tempDir tempDir                    Intermediate output directory
+    --startPhase startPhase              First phase to run
+    --endPhase endPhase                  Last phase to run
+    --evaluate (-e)                      Run ClusterEvaluator and CDbwEvaluator over the
+                                          input. The output will be appended to the rest of
+                                          the output at the end.   
 
-This will create a folder called output inside your $MAHOUT_HOME/examples
-and will have sub-folders for each cluster outputs and ClusteredPoints
+### Standalone Java Program                                          
 
-Run the clusterdump utility as follows as a standalone Java Program through Eclipse - if you are using eclipse, setup mahout-utils as a project as specified in [Working with Maven in Eclipse](../../developers/buildingmahout.html).
+Run the clusterdump utility as follows as a standalone Java Program through Eclipse. <!-- - if you are using eclipse, setup mahout-utils as a project as specified in [Working with Maven in Eclipse](../../developers/buildingmahout.html). -->
     To execute ClusterDumper.java,
     
 * Under mahout-utils, Right-Click on ClusterDumper.java