You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "zhengruifeng (JIRA)" <ji...@apache.org> on 2019/05/08 10:35:01 UTC

[jira] [Resolved] (SPARK-14174) Implement the Mini-Batch KMeans

     [ https://issues.apache.org/jira/browse/SPARK-14174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

zhengruifeng resolved SPARK-14174.
----------------------------------
    Resolution: Not A Problem

> Implement the Mini-Batch KMeans
> -------------------------------
>
>                 Key: SPARK-14174
>                 URL: https://issues.apache.org/jira/browse/SPARK-14174
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>            Reporter: zhengruifeng
>            Priority: Major
>         Attachments: MBKM.xlsx
>
>
> The MiniBatchKMeans is a variant of the KMeans algorithm which uses mini-batches to reduce the computation time, while still attempting to optimise the same objective function. Mini-batches are subsets of the input data, randomly sampled in each training iteration. These mini-batches drastically reduce the amount of computation required to converge to a local solution. In contrast to other algorithms that reduce the convergence time of k-means, mini-batch k-means produces results that are generally only slightly worse than the standard algorithm.
> Comparison of the K-Means and MiniBatchKMeans on sklearn : http://scikit-learn.org/stable/auto_examples/cluster/plot_mini_batch_kmeans.html#example-cluster-plot-mini-batch-kmeans-py
> Since MiniBatch-KMeans with fraction=1.0 is not equal to KMeans, so I make it a new estimator



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org