You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by li...@gmail.com on 2012/10/12 11:38:06 UTC

Mahout KMeans generate doubled cluster number than my initial K setting

Hi,

 

I am a beginner in Mahout, I use Mahout 0.8 and followed the tutorial in
https://cwiki.apache.org/MAHOUT/clustering-of-synthetic-control-data.html

 

First, I use :

`mahout org.apache.mahout.clustering.syntheticcontrol.kmeans.Job -i testdata
-o output -t1 20 -t2 50 -k 5 -x 20 -ow`

 

then use clusterdump to extract the cluster-centers: 

 

    mahout clusterdump --input output/clusters-20-final --output
/media/synthetic_control.center

 

after this, in the synthetic_control.center file: 

 

    VL-585{n=50 c=[29.832, 29.589, 29.405, 28.516, 29.600, ..] r=[3.152,
3.518, 3.292, .]}

    

    VL-591{n=197 c=[29.984, 29.681,.] r=[3.602, 3.558, 3.364,.]}

    

    VL-595{n=203 c=[..] r=[..]}

    

    VL-597{n=61 c=[..] r=[..]}

    

    VL-599{n=43 c=[..] r=[..]}

    

    VL-585{n=1 c=[..] r=[..]}

    

    VL-591{n=27 c=[..] r=[..]}

    

    VL-595{n=1 c=[..] r=[..]}

    

    VL-597{n=1 c=[..] r=[..]}

    

    VL-599{n=16 c=[..] r=[..]}

 

 

It seems the kmean generates 10 clusters, but my initial setting for k is 5.

 

I also tried other k, it always generate doubled clusters.

 

Can anyone help me with this? Thanks a lot!