You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Gourav Khaneja <go...@gmail.com> on 2014/10/24 19:00:53 UTC
Dimensions with value "Zero" (0) are not appearing in the kmeans
cluster output
Hello,
I have a set of 10 dimensional vectors, which I wanted to group into
clusters. I ran mahout kmeans clustering program as follows :
$ mahout kmeans --input input/ --output output/ --clusters clusters/ -k 20
-xm sequential --maxIter 10000 -ow -cd 0.0000000000005
It produces clusters as follows:
gourav@mustang2:~$ mahout clusterdump -i output/clusters-*-final/ -o dump;
cat dump
VL-422383{n=29
c=[93.241, 0.241, 187383906066.860, 0.070,
0.057, 0.042, 0.000]
r=[237.392, 0.625, 29412153437.220, 0.236,
0.036, 0.049, 0.001]}
VL-344819{n=133921
c=[50.032, 775.298, -0.000, 300288032.310,
-0.043, 0.031, 0.016, 0.000]
r=[233.523, 142338.059, 0.007, 92781073.166,
0.267, 0.026, 0.018, 0.000]}
VL-344939{n=3
c=[2.667, 520677772968.333, 0.017, 0.007,
0.000]
r=[0.471, 184177690037.170, 0.008, 0.002,
0.000]}
VL-68598{n=21089
c=[91.973, 1.022, 1489688386.753, -0.045,
0.032, 0.024, 0.000]
r=[546.717, 62.027, 246594193.663, 0.278,
0.029, 0.026, 0.000]}
As you can see, centroid and radius dimension differs between clusters. I
think all dimensions which were zero (0) are ignored. How can I have an
output with original number dimensions ?
Thank you,
Gourav