You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/01/23 14:14:27 UTC

[jira] [Commented] (FLINK-1731) Add kMeans clustering algorithm to machine learning library

    [ https://issues.apache.org/jira/browse/FLINK-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15834626#comment-15834626 ] 

ASF GitHub Bot commented on FLINK-1731:
---------------------------------------

GitHub user sachingoel0101 opened a pull request:

    https://github.com/apache/flink/pull/3192

    [FLINK-1731][ml] Add KMeans clustering(Lloyd's algorithm)

    This is a breakoff from https://github.com/apache/flink/pull/757 to add the lloyd's algorithm first.
    I will follow this up with initialization schemes in the above linked PR. 
    
    To address a few comments from the previous PR:
    We cannot use `DataSet[LabeledVector]` instead of `DataSet[Seq[LabeledVector]]` because the model here is of type `Seq[LabeledVector]` and the semantics of pipeline require as such. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sachingoel0101/flink kmeans

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/3192.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3192
    
----
commit 598f1ea9b4a0e1daf1f151c8b69c88bf83224f71
Author: Peter Schrott <pe...@gmail.com>
Date:   2015-07-29T22:44:54Z

    [FLINK-1731][ml]Added KMeans algorithm to ML library

commit d70c46e71e152b374c9b3f23c9d0bd006bf503ff
Author: Florian Goessler <ma...@floriangoessler.de>
Date:   2015-07-29T22:50:22Z

    [FLINK-1731][ml]Added unit tests for KMeans algorithm

----


> Add kMeans clustering algorithm to machine learning library
> -----------------------------------------------------------
>
>                 Key: FLINK-1731
>                 URL: https://issues.apache.org/jira/browse/FLINK-1731
>             Project: Flink
>          Issue Type: New Feature
>          Components: Machine Learning Library
>            Reporter: Till Rohrmann
>            Assignee: Peter Schrott
>              Labels: ML
>
> The Flink repository already contains a kMeans implementation but it is not yet ported to the machine learning library. I assume that only the used data types have to be adapted and then it can be more or less directly moved to flink-ml.
> The kMeans++ [1] and the kMeans|| [2] algorithm constitute a better implementation because the improve the initial seeding phase to achieve near optimal clustering. It might be worthwhile to implement kMeans||.
> Resources:
> [1] http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf
> [2] http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)