You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Gourav Khaneja <go...@gmail.com> on 2014/10/24 19:00:53 UTC

Dimensions with value "Zero" (0) are not appearing in the kmeans cluster output

Hello,

I have a set of 10 dimensional vectors, which I wanted to group into
clusters. I ran mahout kmeans clustering program as follows :

$ mahout kmeans --input input/  --output output/ --clusters clusters/ -k 20
-xm sequential --maxIter 10000 -ow  -cd 0.0000000000005


It produces clusters as follows:

gourav@mustang2:~$ mahout clusterdump -i output/clusters-*-final/ -o dump;
cat dump


VL-422383{n=29

                            c=[93.241, 0.241, 187383906066.860, 0.070,
0.057, 0.042, 0.000]

                            r=[237.392, 0.625, 29412153437.220, 0.236,
0.036, 0.049, 0.001]}

VL-344819{n=133921

                            c=[50.032, 775.298, -0.000, 300288032.310,
-0.043, 0.031, 0.016, 0.000]

                            r=[233.523, 142338.059, 0.007, 92781073.166,
0.267, 0.026, 0.018, 0.000]}

VL-344939{n=3

                            c=[2.667, 520677772968.333, 0.017, 0.007,
0.000]

                            r=[0.471, 184177690037.170, 0.008, 0.002,
0.000]}

VL-68598{n=21089

                            c=[91.973, 1.022, 1489688386.753, -0.045,
0.032, 0.024, 0.000]

                            r=[546.717, 62.027, 246594193.663, 0.278,
0.029, 0.026, 0.000]}



As you can see, centroid and radius dimension differs between clusters. I
think all dimensions which were zero (0) are ignored. How can I have an
output with original number dimensions ?
Thank you,
Gourav