You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by sirvan paraste <si...@gmail.com> on 2013/06/30 16:09:06 UTC

Fwd: Interpreting the result of StreamingKMeans in mahout 0.8

Hi,


How we can find out which input points are included in a given cluster in
result of StreamingKmeans !? This is needed to evaluate clustering result,
so I think it should be considered to be improved.

I know how to interpret kmenas result in mahout .7 with using namedVector
class and one of dumpers (like clusterdumper). after clustering using
kmeans driver, a directory named clusteredPoints has created which contains
clustering result and using clusterDumper, you can see the created clusters
and the points that are in each one. in below link there is a good solution
for this : How to read Mahout clustering
output<http://stackoverflow.com/questions/11848038/how-to-read-mahout-clustering-output>

But, as I mentioned in title I want to have this capability to interpret
Streaming Kmeans result which is a new feature in mahout .8. In this
feature, it uses a Centroid class for holding data points and each cluster
seeds. The generated result of StreamingKMeans algorithm is only a sequence
file which is constructed of centroid vectors + keys and weights of each
cluster. And in this output there is no information of input data points to
know the distribution of them between clusters. However, it is not possible
to me to get a sense of accuracy of clustering.

by the way, How to get this information in clustering output ? does it
implemented or is there any plan to implement this feature?