You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Ernesto Montaldo <mo...@hotmail.com> on 2014/07/03 12:31:07 UTC

How to analyze K-means clustering result with clusterDump

Hi all,
 
I am playing with mahout in particular I am trying to get result from clustering algorithms as K-means.
I am using the Hadoop 1.2 implementation on a HDinsight cluster along with Mahout 0.9.
What I am trying to do is getting a set of synthetic data and trying to clustering.
What I am running from the hadoop command line is the following command:
 
hadoop jar %mahoutdir%\mahout-examples-0.9-job.jar org.apache.mahout.clustering.syntheticcontrol.kmeans.Job --input /user/myuser/simulation --output /user/myuser/simulation-output -k 5 -t1 20 -t2 50 -x 20 -ow
 
The Mapper and Reducer are apparently executed correctly but when I look at the results by running this command:
 
hadoop jar %mahoutdir%\mahout-examples-0.9-job.jar org.apache.mahout.driver.MahoutDriver clusterdump -i /user/myuser/simulation-output/clusters-5-final/ -of TEXT -o /user/myuser/output/simulation.txt
 
The result I got is a list of centroids, but this is not what I expect. I expect a set of cluster with all the data in.
I obviously making a mistake in some way, but I do not know how and where.
 
What am I doing wrong?
Why executing org.apache.mahout.clustering.syntheticcontrol.kmeans.Job I am not able to explicit the -cl option. If I do that I got an error.
Is there any other way to execute the k-means algorithm?
 
Thank you in advance for the help.
Regards,
Ernesto