You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Pa Rö <pa...@googlemail.com> on 2015/04/24 15:08:06 UTC

flink ml - k-means

hi flink community,

at the time I write my master thesis in the field machine learning. My main
task is to evaluated different k-means variants for large data sets
(BigData). I would like test flink ml against Apache Mahout and Apache
Hadoop MapReduce in areas of scalability and performance(time and space).
What is the current state for the purpose of clustering, especially
K-Means? Will there be in the near future a release information this?

best greetings
paul

Re: flink ml - k-means

Posted by Pa Rö <pa...@googlemail.com>.

okay :)

now i use the following exsample code from here:
https://github.com/apache/flink/blob/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/clustering/KMeans.java

2015-05-11 21:56 GMT+02:00 Stephan Ewen <se...@apache.org>:

> Paul!
>
> Can you use the KMeans example? The code is for three-dimensional points,
> but you should be able to generalize it easily.
> That would be the fastest way to go. without waiting for any release
> dates...
>
> Stephan
>
>
> On Mon, May 11, 2015 at 2:46 PM, Pa Rö <pa...@googlemail.com>
> wrote:
>
>> hi,
>>
>> now i want implement kmeans with flink,
>> maybe you know a release date for flink ml kmeans?
>>
>> best regards
>> paul
>>
>> 2015-04-27 9:36 GMT+02:00 Pa Rö <pa...@googlemail.com>:
>>
>>> Hi Alexander and Till,
>>>
>>> thanks for your informations, I look forward to the release.
>>> I'm curious how well is flink ml against mahout und spark ml.
>>>
>>> best regerds
>>> Paul
>>>
>>> 2015-04-27 9:23 GMT+02:00 Till Rohrmann <tr...@apache.org>:
>>>
>>>> Hi Paul,
>>>>
>>>> if you can't wait, a vanilla implementation is already contained as
>>>> part of the Flink examples. You should find it under flink/flink-examples.
>>>>
>>>> But we will try to add more clustering algorithms in the near future.
>>>>
>>>> Cheers,
>>>> Till
>>>> On Apr 26, 2015 11:14 PM, "Alexander Alexandrov" <
>>>> alexander.s.alexandrov@gmail.com> wrote:
>>>>
>>>>> Yes, I expect to have one in the next few weeks (the code is actually
>>>>> there, but we need to port it to the Flink ML API). I suggest to follow the
>>>>> JIRA issue in the next weeks to check when this is done:
>>>>>
>>>>> https://issues.apache.org/jira/browse/FLINK-1731
>>>>>
>>>>> Regards,
>>>>> Alexander
>>>>>
>>>>> PS. Bear in mind that we will start with a vanilla implementation of
>>>>> K-Means. For a thorough evaluation you might want to also check variants
>>>>> like K-Means++.
>>>>>
>>>>>
>>>>> 2015-04-24 15:08 GMT+02:00 Pa Rö <pa...@googlemail.com>:
>>>>>
>>>>>> hi flink community,
>>>>>>
>>>>>> at the time I write my master thesis in the field machine learning.
>>>>>> My main task is to evaluated different k-means variants for large data sets
>>>>>> (BigData). I would like test flink ml against Apache Mahout and Apache
>>>>>> Hadoop MapReduce in areas of scalability and performance(time and space).
>>>>>> What is the current state for the purpose of clustering, especially
>>>>>> K-Means? Will there be in the near future a release information this?
>>>>>>
>>>>>> best greetings
>>>>>> paul
>>>>>>
>>>>>
>>>>>
>>>
>>
>

Re: flink ml - k-means

Posted by Stephan Ewen <se...@apache.org>.

Paul!

Can you use the KMeans example? The code is for three-dimensional points,
but you should be able to generalize it easily.
That would be the fastest way to go. without waiting for any release
dates...

Stephan


On Mon, May 11, 2015 at 2:46 PM, Pa Rö <pa...@googlemail.com>
wrote:

> hi,
>
> now i want implement kmeans with flink,
> maybe you know a release date for flink ml kmeans?
>
> best regards
> paul
>
> 2015-04-27 9:36 GMT+02:00 Pa Rö <pa...@googlemail.com>:
>
>> Hi Alexander and Till,
>>
>> thanks for your informations, I look forward to the release.
>> I'm curious how well is flink ml against mahout und spark ml.
>>
>> best regerds
>> Paul
>>
>> 2015-04-27 9:23 GMT+02:00 Till Rohrmann <tr...@apache.org>:
>>
>>> Hi Paul,
>>>
>>> if you can't wait, a vanilla implementation is already contained as part
>>> of the Flink examples. You should find it under flink/flink-examples.
>>>
>>> But we will try to add more clustering algorithms in the near future.
>>>
>>> Cheers,
>>> Till
>>> On Apr 26, 2015 11:14 PM, "Alexander Alexandrov" <
>>> alexander.s.alexandrov@gmail.com> wrote:
>>>
>>>> Yes, I expect to have one in the next few weeks (the code is actually
>>>> there, but we need to port it to the Flink ML API). I suggest to follow the
>>>> JIRA issue in the next weeks to check when this is done:
>>>>
>>>> https://issues.apache.org/jira/browse/FLINK-1731
>>>>
>>>> Regards,
>>>> Alexander
>>>>
>>>> PS. Bear in mind that we will start with a vanilla implementation of
>>>> K-Means. For a thorough evaluation you might want to also check variants
>>>> like K-Means++.
>>>>
>>>>
>>>> 2015-04-24 15:08 GMT+02:00 Pa Rö <pa...@googlemail.com>:
>>>>
>>>>> hi flink community,
>>>>>
>>>>> at the time I write my master thesis in the field machine learning. My
>>>>> main task is to evaluated different k-means variants for large data sets
>>>>> (BigData). I would like test flink ml against Apache Mahout and Apache
>>>>> Hadoop MapReduce in areas of scalability and performance(time and space).
>>>>> What is the current state for the purpose of clustering, especially
>>>>> K-Means? Will there be in the near future a release information this?
>>>>>
>>>>> best greetings
>>>>> paul
>>>>>
>>>>
>>>>
>>
>

Re: flink ml - k-means

Posted by Pa Rö <pa...@googlemail.com>.

hi,

now i want implement kmeans with flink,
maybe you know a release date for flink ml kmeans?

best regards
paul

2015-04-27 9:36 GMT+02:00 Pa Rö <pa...@googlemail.com>:

> Hi Alexander and Till,
>
> thanks for your informations, I look forward to the release.
> I'm curious how well is flink ml against mahout und spark ml.
>
> best regerds
> Paul
>
> 2015-04-27 9:23 GMT+02:00 Till Rohrmann <tr...@apache.org>:
>
>> Hi Paul,
>>
>> if you can't wait, a vanilla implementation is already contained as part
>> of the Flink examples. You should find it under flink/flink-examples.
>>
>> But we will try to add more clustering algorithms in the near future.
>>
>> Cheers,
>> Till
>> On Apr 26, 2015 11:14 PM, "Alexander Alexandrov" <
>> alexander.s.alexandrov@gmail.com> wrote:
>>
>>> Yes, I expect to have one in the next few weeks (the code is actually
>>> there, but we need to port it to the Flink ML API). I suggest to follow the
>>> JIRA issue in the next weeks to check when this is done:
>>>
>>> https://issues.apache.org/jira/browse/FLINK-1731
>>>
>>> Regards,
>>> Alexander
>>>
>>> PS. Bear in mind that we will start with a vanilla implementation of
>>> K-Means. For a thorough evaluation you might want to also check variants
>>> like K-Means++.
>>>
>>>
>>> 2015-04-24 15:08 GMT+02:00 Pa Rö <pa...@googlemail.com>:
>>>
>>>> hi flink community,
>>>>
>>>> at the time I write my master thesis in the field machine learning. My
>>>> main task is to evaluated different k-means variants for large data sets
>>>> (BigData). I would like test flink ml against Apache Mahout and Apache
>>>> Hadoop MapReduce in areas of scalability and performance(time and space).
>>>> What is the current state for the purpose of clustering, especially
>>>> K-Means? Will there be in the near future a release information this?
>>>>
>>>> best greetings
>>>> paul
>>>>
>>>
>>>
>

Re: flink ml - k-means

Posted by Pa Rö <pa...@googlemail.com>.

Hi Alexander and Till,

thanks for your informations, I look forward to the release.
I'm curious how well is flink ml against mahout und spark ml.

best regerds
Paul

2015-04-27 9:23 GMT+02:00 Till Rohrmann <tr...@apache.org>:

> Hi Paul,
>
> if you can't wait, a vanilla implementation is already contained as part
> of the Flink examples. You should find it under flink/flink-examples.
>
> But we will try to add more clustering algorithms in the near future.
>
> Cheers,
> Till
> On Apr 26, 2015 11:14 PM, "Alexander Alexandrov" <
> alexander.s.alexandrov@gmail.com> wrote:
>
>> Yes, I expect to have one in the next few weeks (the code is actually
>> there, but we need to port it to the Flink ML API). I suggest to follow the
>> JIRA issue in the next weeks to check when this is done:
>>
>> https://issues.apache.org/jira/browse/FLINK-1731
>>
>> Regards,
>> Alexander
>>
>> PS. Bear in mind that we will start with a vanilla implementation of
>> K-Means. For a thorough evaluation you might want to also check variants
>> like K-Means++.
>>
>>
>> 2015-04-24 15:08 GMT+02:00 Pa Rö <pa...@googlemail.com>:
>>
>>> hi flink community,
>>>
>>> at the time I write my master thesis in the field machine learning. My
>>> main task is to evaluated different k-means variants for large data sets
>>> (BigData). I would like test flink ml against Apache Mahout and Apache
>>> Hadoop MapReduce in areas of scalability and performance(time and space).
>>> What is the current state for the purpose of clustering, especially
>>> K-Means? Will there be in the near future a release information this?
>>>
>>> best greetings
>>> paul
>>>
>>
>>

Re: flink ml - k-means

Posted by Till Rohrmann <tr...@apache.org>.

Hi Paul,

if you can't wait, a vanilla implementation is already contained as part of
the Flink examples. You should find it under flink/flink-examples.

But we will try to add more clustering algorithms in the near future.

Cheers,
Till
On Apr 26, 2015 11:14 PM, "Alexander Alexandrov" <
alexander.s.alexandrov@gmail.com> wrote:

> Yes, I expect to have one in the next few weeks (the code is actually
> there, but we need to port it to the Flink ML API). I suggest to follow the
> JIRA issue in the next weeks to check when this is done:
>
> https://issues.apache.org/jira/browse/FLINK-1731
>
> Regards,
> Alexander
>
> PS. Bear in mind that we will start with a vanilla implementation of
> K-Means. For a thorough evaluation you might want to also check variants
> like K-Means++.
>
>
> 2015-04-24 15:08 GMT+02:00 Pa Rö <pa...@googlemail.com>:
>
>> hi flink community,
>>
>> at the time I write my master thesis in the field machine learning. My
>> main task is to evaluated different k-means variants for large data sets
>> (BigData). I would like test flink ml against Apache Mahout and Apache
>> Hadoop MapReduce in areas of scalability and performance(time and space).
>> What is the current state for the purpose of clustering, especially
>> K-Means? Will there be in the near future a release information this?
>>
>> best greetings
>> paul
>>
>
>

Re: flink ml - k-means

Posted by Alexander Alexandrov <al...@gmail.com>.

Yes, I expect to have one in the next few weeks (the code is actually
there, but we need to port it to the Flink ML API). I suggest to follow the
JIRA issue in the next weeks to check when this is done:

https://issues.apache.org/jira/browse/FLINK-1731

Regards,
Alexander

PS. Bear in mind that we will start with a vanilla implementation of
K-Means. For a thorough evaluation you might want to also check variants
like K-Means++.


2015-04-24 15:08 GMT+02:00 Pa Rö <pa...@googlemail.com>:

> hi flink community,
>
> at the time I write my master thesis in the field machine learning. My
> main task is to evaluated different k-means variants for large data sets
> (BigData). I would like test flink ml against Apache Mahout and Apache
> Hadoop MapReduce in areas of scalability and performance(time and space).
> What is the current state for the purpose of clustering, especially
> K-Means? Will there be in the near future a release information this?
>
> best greetings
> paul
>