You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Nick Pentreath (JIRA)" <ji...@apache.org> on 2017/05/24 19:06:04 UTC

[jira] [Closed] (SPARK-6000) Batch K-Means clusters should support "mini-batch" updates

     [ https://issues.apache.org/jira/browse/SPARK-6000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nick Pentreath closed SPARK-6000.
---------------------------------
    Resolution: Duplicate

> Batch K-Means clusters should support "mini-batch" updates
> ----------------------------------------------------------
>
>                 Key: SPARK-6000
>                 URL: https://issues.apache.org/jira/browse/SPARK-6000
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>    Affects Versions: 1.2.1
>            Reporter: Derrick Burns
>            Priority: Minor
>
> One of the ways of improving the performance of the K-means clustering algorithm is to sample the points on each round of the Lloyd's algorithm and to only use those samples to update the cluster centers.  (Note that this is similar to the update algorithm of streaming K-means.)  The Spark K-Means clusterer should support the mini-batch algorithm for large data sets. 
> The K-Means implementation at 
> https://github.com/derrickburns/generalized-kmeans-clustering supports the mini-batch algorithm.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org