You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by Gmail <ma...@gmail.com> on 2014/03/04 11:43:08 UTC

kMeans Implementation

Hello,
I was studying Mahout libraries and I found something of strange in your 
kMeans implementation.

I was looking inside it and I have noticed that kMeans only uses map 
functions, omitting the reducers. Why have you done this choice?
It is not using MapReduce programming model even if it is declared that 
the Mahout's core is Hadoop.
Is this choice driven by performance issue?

Best regards
Manuel Sequino

Re: kMeans Implementation

Posted by Suneel Marthi <su...@yahoo.com>.

He's talking about simple kmeans which is a mapper only job. Sean's already addressed his question

Sent from my iPhone

> On Mar 4, 2014, at 5:49 AM, Sebastian Schelter <ss...@apache.org> wrote:
> 
> We have several implementations of k-Means, which one do you refer to?
> 
> --sebastian
> 
>> On 03/04/2014 11:43 AM, Gmail wrote:
>> Hello,
>> I was studying Mahout libraries and I found something of strange in your
>> kMeans implementation.
>> 
>> I was looking inside it and I have noticed that kMeans only uses map
>> functions, omitting the reducers. Why have you done this choice?
>> It is not using MapReduce programming model even if it is declared that
>> the Mahout's core is Hadoop.
>> Is this choice driven by performance issue?
>> 
>> Best regards
>> Manuel Sequino
>

Re: kMeans Implementation

Posted by Sebastian Schelter <ss...@apache.org>.

We have several implementations of k-Means, which one do you refer to?

--sebastian

On 03/04/2014 11:43 AM, Gmail wrote:
> Hello,
> I was studying Mahout libraries and I found something of strange in your
> kMeans implementation.
>
> I was looking inside it and I have noticed that kMeans only uses map
> functions, omitting the reducers. Why have you done this choice?
> It is not using MapReduce programming model even if it is declared that
> the Mahout's core is Hadoop.
> Is this choice driven by performance issue?
>
> Best regards
> Manuel Sequino
>
>

Re: kMeans Implementation

Posted by Sam Bessalah <sa...@gmail.com>.

I don't see why is that a problem.


On Tue, Mar 4, 2014 at 11:43 AM, Gmail <ma...@gmail.com> wrote:

> Hello,
> I was studying Mahout libraries and I found something of strange in your
> kMeans implementation.
>
> I was looking inside it and I have noticed that kMeans only uses map
> functions, omitting the reducers. Why have you done this choice?
> It is not using MapReduce programming model even if it is declared that
> the Mahout's core is Hadoop.
> Is this choice driven by performance issue?
>
> Best regards
> Manuel Sequino
>
>
>

Re: kMeans Implementation

Posted by Gmail <ma...@gmail.com>.

I used the kMeansDriver class, in clustering.kmeans package.
Yes I know that the use of MapReduce is mandatory, but I think that 
exists an easier implementation and especially mapreduce oriented.

Anyway, I thought it was a choice driven by performances.

Thank you.


On 03/04/2014 11:48 AM, Sean Owen wrote:
> Although I don't know exactly what you're referring to, in general,
> nothing about Map/Reduce means you always use a reducer. There are
> plenty of tasks that are much more appropriate as a map-only or
> reduce-only job. So this assertion doesn't fly to start with. But if
> you see two jobs that might be merged into one, that could be a useful
> suggestion.
>
> On Tue, Mar 4, 2014 at 10:43 AM, Gmail <ma...@gmail.com> wrote:
>> Hello,
>> I was studying Mahout libraries and I found something of strange in your
>> kMeans implementation.
>>
>> I was looking inside it and I have noticed that kMeans only uses map
>> functions, omitting the reducers. Why have you done this choice?
>> It is not using MapReduce programming model even if it is declared that the
>> Mahout's core is Hadoop.
>> Is this choice driven by performance issue?
>>
>> Best regards
>> Manuel Sequino
>>
>>
> .
>

Re: kMeans Implementation

Posted by Sean Owen <sr...@gmail.com>.

Although I don't know exactly what you're referring to, in general,
nothing about Map/Reduce means you always use a reducer. There are
plenty of tasks that are much more appropriate as a map-only or
reduce-only job. So this assertion doesn't fly to start with. But if
you see two jobs that might be merged into one, that could be a useful
suggestion.

On Tue, Mar 4, 2014 at 10:43 AM, Gmail <ma...@gmail.com> wrote:
> Hello,
> I was studying Mahout libraries and I found something of strange in your
> kMeans implementation.
>
> I was looking inside it and I have noticed that kMeans only uses map
> functions, omitting the reducers. Why have you done this choice?
> It is not using MapReduce programming model even if it is declared that the
> Mahout's core is Hadoop.
> Is this choice driven by performance issue?
>
> Best regards
> Manuel Sequino
>
>