You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hama.apache.org by "Martin Illecker (JIRA)" <ji...@apache.org> on 2013/12/23 13:50:52 UTC

[jira] [Created] (HAMA-834) Fix KMeans example

Martin Illecker created HAMA-834:
------------------------------------

             Summary: Fix KMeans example
                 Key: HAMA-834
                 URL: https://issues.apache.org/jira/browse/HAMA-834
             Project: Hama
          Issue Type: Bug
          Components: examples, machine learning
    Affects Versions: 0.6.3
            Reporter: Martin Illecker
             Fix For: 0.7.0


Fix problems in KMeans example and revise test case.

1) Typo \[1] and input path issue

2) Wrong *summationCount* in assignCentersInternal
*summationCount* should also be incremented if \[2] 
{code}
if (clusterCenter == null) {
  newCenterArray[lowestDistantCenter] = key;
}
{code}
Otherwise *summationCount* may stay zero when only one value is assigned. Then this zero will be propagated to *incrementSum* \[3] and might cause a divide by zero in \[4]. 

By the way if we add three vectors and the *summationCount* would only be two, this will lead to wrong results. Because later we are dividing the vector by the amount of increments.

3) Results depend on the amount *numBspTask*
(results vary if *numBspTask* is changed)

\[1]
https://github.com/apache/hama/blob/trunk/ml/src/main/java/org/apache/hama/ml/kmeans/KMeansBSP.java#L518-519
\[2] https://github.com/apache/hama/blob/trunk/ml/src/main/java/org/apache/hama/ml/kmeans/KMeansBSP.java#L249
\[3]
https://github.com/apache/hama/blob/trunk/ml/src/main/java/org/apache/hama/ml/kmeans/KMeansBSP.java#L161
\[4] https://github.com/apache/hama/blob/trunk/ml/src/main/java/org/apache/hama/ml/kmeans/KMeansBSP.java#L172



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)