You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2014/10/02 22:21:35 UTC

[jira] [Commented] (SPARK-3424) KMeans Plus Plus is too slow

    [ https://issues.apache.org/jira/browse/SPARK-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157116#comment-14157116 ] 

Apache Spark commented on SPARK-3424:
-------------------------------------

User 'derrickburns' has created a pull request for this issue:
https://github.com/apache/spark/pull/2634

> KMeans Plus Plus is too slow
> ----------------------------
>
>                 Key: SPARK-3424
>                 URL: https://issues.apache.org/jira/browse/SPARK-3424
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>    Affects Versions: 1.0.2
>            Reporter: Derrick Burns
>
> The  KMeansPlusPlus algorithm is implemented in time O( m k^2), where m is the rounds of the KMeansParallel algorithm and k is the number of clusters.  
> This can be dramatically improved by maintaining the distance the closest cluster center from round to round and then incrementally updating that value for each point. This incremental update is O(1) time, this reduces the running time for K Means Plus Plus to O( m k ).  For large k, this is significant.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org