You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hama.apache.org by "Edward J. Yoon (JIRA)" <ji...@apache.org> on 2014/01/06 02:40:50 UTC
[jira] [Commented] (HAMA-834) Fix KMeans example
[ https://issues.apache.org/jira/browse/HAMA-834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862716#comment-13862716 ]
Edward J. Yoon commented on HAMA-834:
-------------------------------------
Input generated incorrectly. There's a bug in generateInputText or prepareInputText methods.
{code}
edward@edward-VirtualBox:~/workspace/hama-trunk$ ls -al /tmp/clustering/in/parts/
total 32
drwxr-xr-x 2 edward edward 4096 Jan 6 10:36 .
drwxr-xr-x 3 edward edward 4096 Jan 6 10:36 ..
-rwxrwxrwx 1 edward edward 1059 Jan 6 10:36 part0.seq
-rw-r--r-- 1 edward edward 20 Jan 6 10:36 .part0.seq.crc
-rwxrwxrwx 1 edward edward 1059 Jan 6 10:36 part1.seq
-rw-r--r-- 1 edward edward 20 Jan 6 10:36 .part1.seq.crc
-rwxrwxrwx 1 edward edward 2065 Jan 6 10:36 part2.seq
-rw-r--r-- 1 edward edward 28 Jan 6 10:36 .part2.seq.crc
{code}
> Fix KMeans example
> ------------------
>
> Key: HAMA-834
> URL: https://issues.apache.org/jira/browse/HAMA-834
> Project: Hama
> Issue Type: Bug
> Components: examples, machine learning
> Affects Versions: 0.6.3
> Reporter: Martin Illecker
> Labels: example
> Fix For: 0.7.0
>
> Attachments: HAMA-834.patch
>
>
> Fix problems in KMeans example and revise test case.
> 1) Typo \[1] and input path issue
> 2) Wrong *summationCount* in assignCentersInternal
> *summationCount* should also be incremented if \[2]
> {code}
> if (clusterCenter == null) {
> newCenterArray[lowestDistantCenter] = key;
> }
> {code}
> Otherwise *summationCount* may stay zero when only one value is assigned. Then this zero will be propagated to *incrementSum* \[3] and might cause a divide by zero in \[4].
> By the way if we add three vectors and the *summationCount* would only be two, this will lead to wrong results. Because later we are dividing the vector by the amount of increments.
> 3) Results depend on the amount *numBspTask*
> (results vary if *numBspTask* is changed)
> \[1]
> https://github.com/apache/hama/blob/trunk/ml/src/main/java/org/apache/hama/ml/kmeans/KMeansBSP.java#L518-519
> \[2] https://github.com/apache/hama/blob/trunk/ml/src/main/java/org/apache/hama/ml/kmeans/KMeansBSP.java#L249
> \[3]
> https://github.com/apache/hama/blob/trunk/ml/src/main/java/org/apache/hama/ml/kmeans/KMeansBSP.java#L161
> \[4] https://github.com/apache/hama/blob/trunk/ml/src/main/java/org/apache/hama/ml/kmeans/KMeansBSP.java#L172
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)