You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by hiroshi leon <hi...@hotmail.com> on 2014/03/15 22:36:58 UTC

RE: Mahout parallel K-Means - algorithms analysis‏

Hello Ted,

Thank you so much for your reply, the program that I was checking is the KMeansDriver class with the run function,
the buildCluster function in the same class and following the ClusterIterator class with
the iterateMR function. 

I would like to know how where can I check the code that is implemented for the mapper and the 
reducer? is it in the CIMappper.class and CIReducer.class?

Is there a research paper or pseudo-code in which Mahout parallel K-means was based on?

Thank you so much and have a nice day.

Best regards


> From: ted.dunning@gmail.com
> Date: Sat, 15 Mar 2014 13:56:56 -0700
> Subject: Re: Mahout parallel K-Means - algorithms analysis‏
> To: user@mahout.apache.org
> 
> We would love to help.
> 
> Can you say which program and which classes you are looking at?
> 
> 
> On Sat, Mar 15, 2014 at 12:58 PM, hiroshi leon <hi...@hotmail.com>wrote:
> 
> > To whom it may correspond,
> >
> > Hello, I have been checking the algorithm of Mahout 0.9 version k-means
> > using MapReduce and I would like to know where can I check the code of
> > what is happening inside the map function and in the reducer?
> >
> >
> > I was debugging using NetBeans and I was not able to find what is exactly
> > implemented in the Map and Reduce functions...
> >
> >
> >
> > The reason what I am doing this is because I would like to know what
> > is exactly implemented in the version of Mahout 0.9 in order to see
> > which parts where optimized on the K-Means mapReduce algorithm.
> >
> >
> >
> > Do you know  which research paper the Mahout K-means was based on or where
> > can I read the pseudo code?
> >
> >
> >
> > Thank you so much!
> >
> >
> >
> > Best regards!
> >
> > Hiroshi
 		 	   		  

RE: Mahout parallel K-Means - algorithms analysis

Posted by hiroshi leon <hi...@hotmail.com>.
Thanks Suneel,

Can someone please explain me a litlte bit about the ClusteringPolicy and the clusterClassifier?
and what are the benefits when using it with parallel K-Means?

Thank you so much,

Best regards.

> Date: Tue, 18 Mar 2014 04:35:14 -0700
> From: suneel_marthi@yahoo.com
> Subject: Re: Mahout parallel K-Means - algorithms analysis
> To: user@mahout.apache.org
> 
> Canopy and KMeans run independently and do not call eachother. 
> 
> For KMEans, the K value has to be specified when invoking KMeans.
> 
> Typically u run Canopy first and then invoke KMeans with the appropriate K-value as inferred from Canopy.
> 
> 
> 
> 
> 
> 
> 
> On Tuesday, March 18, 2014 4:33 AM, hiroshi leon <hi...@hotmail.com> wrote:
>  
> Thank you Wei and Suneel, 
> 
> By the way, does somebody know if the Parallel K-means of Mahout is using 
> Cannopy clustering at the beginning to generate the initial K in the K-Means driver class?
> 
> Best regards,
> 
> Hiroshi
> 
> > Date: Mon, 17 Mar 2014 13:05:01 -0700
> > Subject: Re: Mahout parallel K-Means - algorithms analysis
> > From: weishung@gmail.com
> > To: user@mahout.apache.org
> > CC: ted.dunning@gmail.com
> > 
> > You could take a look
> > at org.apache.mahout.clustering.classify/ClusterClassificationMapper
> > 
> > Enjoy,
> > Wei Shung
> > 
> > 
> > On Sat, Mar 15, 2014 at 2:51 PM, Suneel Marthi <su...@yahoo.com>wrote:
> > 
> > > The clustering code is cimapper and cireducer.  Following the clustering,
> > > there is cluster classification which is mapper only.
> > >
> > > Not sure about the reference paper, this stuffs been around for long but
> > > the documentation for kmeans on mahout.apache.org should explain the
> > > approach.
> > >
> > > Sent from my iPhone
> > >
> > > > On Mar 15, 2014, at 5:36 PM, hiroshi leon <hi...@hotmail.com>
> > > wrote:
> > > >
> > > > Hello Ted,
> > > >
> > > > Thank you so much for your reply, the program that I was checking is the
> > > KMeansDriver class with the run function,
> > > > the buildCluster function in the same class and following the
> > > ClusterIterator class with
> > > > the iterateMR function.
> > > >
> > > > I would like to know how where can I check the code that is implemented
> > > for the mapper and the
> > > > reducer? is it in the CIMappper.class and CIReducer.class?
> > > >
> > > > Is there a research paper or pseudo-code in which Mahout parallel
> > > K-means was based on?
> > > >
> > > > Thank you so much and have a nice day.
> > > >
> > > > Best regards
> > > >
> > > >
> > > >> From: ted.dunning@gmail.com
> > > >> Date: Sat, 15 Mar 2014 13:56:56 -0700
> > > >> Subject: Re: Mahout parallel K-Means - algorithms analysis
> > > >> To: user@mahout.apache.org
> > > >>
> > > >> We would love to help.
> > > >>
> > > >> Can you say which program and which classes you are looking at?
> > > >>
> > > >>
> > > >> On Sat, Mar 15, 2014 at 12:58 PM, hiroshi leon <
> > > hiroshi_8712@hotmail.com>wrote:
> > > >>
> > > >>> To whom it may correspond,
> > > >>>
> > > >>> Hello, I have been checking the algorithm of Mahout 0.9 version k-means
> > > >>> using MapReduce and I would like to know where can I check the code of
> > > >>> what is happening inside the map function and in the reducer?
> > > >>>
> > > >>>
> > > >>> I was debugging using NetBeans and I was not able to find what is
> > > exactly
> > > >>> implemented in the Map and Reduce functions...
> > > >>>
> > > >>>
> > > >>>
> > > >>> The reason what I am doing this is because I would like to know what
> > > >>> is exactly implemented in the version of Mahout 0.9 in order to see
> > > >>> which parts where optimized on the K-Means mapReduce algorithm.
> > > >>>
> > > >>>
> > > >>>
> > > >>> Do you know  which research paper the Mahout K-means was based on or
> > > where
> > > >>> can I read the pseudo code?
> > > >>>
> > > >>>
> > > >>>
> > > >>> Thank you so much!
> > > >>>
> > > >>>
> > > >>>
> > > >>> Best regards!
> > > >>>
> > > >>> Hiroshi
> > > >
> > >
 		 	   		  

Re: Mahout parallel K-Means - algorithms analysis

Posted by Suneel Marthi <su...@yahoo.com>.
Canopy and KMeans run independently and do not call eachother. 

For KMEans, the K value has to be specified when invoking KMeans.

Typically u run Canopy first and then invoke KMeans with the appropriate K-value as inferred from Canopy.







On Tuesday, March 18, 2014 4:33 AM, hiroshi leon <hi...@hotmail.com> wrote:
 
Thank you Wei and Suneel, 

By the way, does somebody know if the Parallel K-means of Mahout is using 
Cannopy clustering at the beginning to generate the initial K in the K-Means driver class?

Best regards,

Hiroshi

> Date: Mon, 17 Mar 2014 13:05:01 -0700
> Subject: Re: Mahout parallel K-Means - algorithms analysis
> From: weishung@gmail.com
> To: user@mahout.apache.org
> CC: ted.dunning@gmail.com
> 
> You could take a look
> at org.apache.mahout.clustering.classify/ClusterClassificationMapper
> 
> Enjoy,
> Wei Shung
> 
> 
> On Sat, Mar 15, 2014 at 2:51 PM, Suneel Marthi <su...@yahoo.com>wrote:
> 
> > The clustering code is cimapper and cireducer.  Following the clustering,
> > there is cluster classification which is mapper only.
> >
> > Not sure about the reference paper, this stuffs been around for long but
> > the documentation for kmeans on mahout.apache.org should explain the
> > approach.
> >
> > Sent from my iPhone
> >
> > > On Mar 15, 2014, at 5:36 PM, hiroshi leon <hi...@hotmail.com>
> > wrote:
> > >
> > > Hello Ted,
> > >
> > > Thank you so much for your reply, the program that I was checking is the
> > KMeansDriver class with the run function,
> > > the buildCluster function in the same class and following the
> > ClusterIterator class with
> > > the iterateMR function.
> > >
> > > I would like to know how where can I check the code that is implemented
> > for the mapper and the
> > > reducer? is it in the CIMappper.class and CIReducer.class?
> > >
> > > Is there a research paper or pseudo-code in which Mahout parallel
> > K-means was based on?
> > >
> > > Thank you so much and have a nice day.
> > >
> > > Best regards
> > >
> > >
> > >> From: ted.dunning@gmail.com
> > >> Date: Sat, 15 Mar 2014 13:56:56 -0700
> > >> Subject: Re: Mahout parallel K-Means - algorithms analysis
> > >> To: user@mahout.apache.org
> > >>
> > >> We would love to help.
> > >>
> > >> Can you say which program and which classes you are looking at?
> > >>
> > >>
> > >> On Sat, Mar 15, 2014 at 12:58 PM, hiroshi leon <
> > hiroshi_8712@hotmail.com>wrote:
> > >>
> > >>> To whom it may correspond,
> > >>>
> > >>> Hello, I have been checking the algorithm of Mahout 0.9 version k-means
> > >>> using MapReduce and I would like to know where can I check the code of
> > >>> what is happening inside the map function and in the reducer?
> > >>>
> > >>>
> > >>> I was debugging using NetBeans and I was not able to find what is
> > exactly
> > >>> implemented in the Map and Reduce functions...
> > >>>
> > >>>
> > >>>
> > >>> The reason what I am doing this is because I would like to know what
> > >>> is exactly implemented in the version of Mahout 0.9 in order to see
> > >>> which parts where optimized on the K-Means mapReduce algorithm.
> > >>>
> > >>>
> > >>>
> > >>> Do you know  which research paper the Mahout K-means was based on or
> > where
> > >>> can I read the pseudo code?
> > >>>
> > >>>
> > >>>
> > >>> Thank you so much!
> > >>>
> > >>>
> > >>>
> > >>> Best regards!
> > >>>
> > >>> Hiroshi
> > >
> >

RE: Mahout parallel K-Means - algorithms analysis

Posted by hiroshi leon <hi...@hotmail.com>.
Thank you Wei and Suneel, 

By the way, does somebody know if the Parallel K-means of Mahout is using 
Cannopy clustering at the beginning to generate the initial K in the K-Means driver class?

Best regards,

Hiroshi

> Date: Mon, 17 Mar 2014 13:05:01 -0700
> Subject: Re: Mahout parallel K-Means - algorithms analysis
> From: weishung@gmail.com
> To: user@mahout.apache.org
> CC: ted.dunning@gmail.com
> 
> You could take a look
> at org.apache.mahout.clustering.classify/ClusterClassificationMapper
> 
> Enjoy,
> Wei Shung
> 
> 
> On Sat, Mar 15, 2014 at 2:51 PM, Suneel Marthi <su...@yahoo.com>wrote:
> 
> > The clustering code is cimapper and cireducer.  Following the clustering,
> > there is cluster classification which is mapper only.
> >
> > Not sure about the reference paper, this stuffs been around for long but
> > the documentation for kmeans on mahout.apache.org should explain the
> > approach.
> >
> > Sent from my iPhone
> >
> > > On Mar 15, 2014, at 5:36 PM, hiroshi leon <hi...@hotmail.com>
> > wrote:
> > >
> > > Hello Ted,
> > >
> > > Thank you so much for your reply, the program that I was checking is the
> > KMeansDriver class with the run function,
> > > the buildCluster function in the same class and following the
> > ClusterIterator class with
> > > the iterateMR function.
> > >
> > > I would like to know how where can I check the code that is implemented
> > for the mapper and the
> > > reducer? is it in the CIMappper.class and CIReducer.class?
> > >
> > > Is there a research paper or pseudo-code in which Mahout parallel
> > K-means was based on?
> > >
> > > Thank you so much and have a nice day.
> > >
> > > Best regards
> > >
> > >
> > >> From: ted.dunning@gmail.com
> > >> Date: Sat, 15 Mar 2014 13:56:56 -0700
> > >> Subject: Re: Mahout parallel K-Means - algorithms analysis
> > >> To: user@mahout.apache.org
> > >>
> > >> We would love to help.
> > >>
> > >> Can you say which program and which classes you are looking at?
> > >>
> > >>
> > >> On Sat, Mar 15, 2014 at 12:58 PM, hiroshi leon <
> > hiroshi_8712@hotmail.com>wrote:
> > >>
> > >>> To whom it may correspond,
> > >>>
> > >>> Hello, I have been checking the algorithm of Mahout 0.9 version k-means
> > >>> using MapReduce and I would like to know where can I check the code of
> > >>> what is happening inside the map function and in the reducer?
> > >>>
> > >>>
> > >>> I was debugging using NetBeans and I was not able to find what is
> > exactly
> > >>> implemented in the Map and Reduce functions...
> > >>>
> > >>>
> > >>>
> > >>> The reason what I am doing this is because I would like to know what
> > >>> is exactly implemented in the version of Mahout 0.9 in order to see
> > >>> which parts where optimized on the K-Means mapReduce algorithm.
> > >>>
> > >>>
> > >>>
> > >>> Do you know  which research paper the Mahout K-means was based on or
> > where
> > >>> can I read the pseudo code?
> > >>>
> > >>>
> > >>>
> > >>> Thank you so much!
> > >>>
> > >>>
> > >>>
> > >>> Best regards!
> > >>>
> > >>> Hiroshi
> > >
> >
 		 	   		  

Re: Mahout parallel K-Means - algorithms analysis

Posted by Weishung Chung <we...@gmail.com>.
You could take a look
at org.apache.mahout.clustering.classify/ClusterClassificationMapper

Enjoy,
Wei Shung


On Sat, Mar 15, 2014 at 2:51 PM, Suneel Marthi <su...@yahoo.com>wrote:

> The clustering code is cimapper and cireducer.  Following the clustering,
> there is cluster classification which is mapper only.
>
> Not sure about the reference paper, this stuffs been around for long but
> the documentation for kmeans on mahout.apache.org should explain the
> approach.
>
> Sent from my iPhone
>
> > On Mar 15, 2014, at 5:36 PM, hiroshi leon <hi...@hotmail.com>
> wrote:
> >
> > Hello Ted,
> >
> > Thank you so much for your reply, the program that I was checking is the
> KMeansDriver class with the run function,
> > the buildCluster function in the same class and following the
> ClusterIterator class with
> > the iterateMR function.
> >
> > I would like to know how where can I check the code that is implemented
> for the mapper and the
> > reducer? is it in the CIMappper.class and CIReducer.class?
> >
> > Is there a research paper or pseudo-code in which Mahout parallel
> K-means was based on?
> >
> > Thank you so much and have a nice day.
> >
> > Best regards
> >
> >
> >> From: ted.dunning@gmail.com
> >> Date: Sat, 15 Mar 2014 13:56:56 -0700
> >> Subject: Re: Mahout parallel K-Means - algorithms analysis
> >> To: user@mahout.apache.org
> >>
> >> We would love to help.
> >>
> >> Can you say which program and which classes you are looking at?
> >>
> >>
> >> On Sat, Mar 15, 2014 at 12:58 PM, hiroshi leon <
> hiroshi_8712@hotmail.com>wrote:
> >>
> >>> To whom it may correspond,
> >>>
> >>> Hello, I have been checking the algorithm of Mahout 0.9 version k-means
> >>> using MapReduce and I would like to know where can I check the code of
> >>> what is happening inside the map function and in the reducer?
> >>>
> >>>
> >>> I was debugging using NetBeans and I was not able to find what is
> exactly
> >>> implemented in the Map and Reduce functions...
> >>>
> >>>
> >>>
> >>> The reason what I am doing this is because I would like to know what
> >>> is exactly implemented in the version of Mahout 0.9 in order to see
> >>> which parts where optimized on the K-Means mapReduce algorithm.
> >>>
> >>>
> >>>
> >>> Do you know  which research paper the Mahout K-means was based on or
> where
> >>> can I read the pseudo code?
> >>>
> >>>
> >>>
> >>> Thank you so much!
> >>>
> >>>
> >>>
> >>> Best regards!
> >>>
> >>> Hiroshi
> >
>

Re: Mahout parallel K-Means - algorithms analysis‏

Posted by Suneel Marthi <su...@yahoo.com>.
The clustering code is cimapper and cireducer.  Following the clustering, there is cluster classification which is mapper only.

Not sure about the reference paper, this stuffs been around for long but the documentation for kmeans on mahout.apache.org should explain the approach.

Sent from my iPhone

> On Mar 15, 2014, at 5:36 PM, hiroshi leon <hi...@hotmail.com> wrote:
> 
> Hello Ted,
> 
> Thank you so much for your reply, the program that I was checking is the KMeansDriver class with the run function,
> the buildCluster function in the same class and following the ClusterIterator class with
> the iterateMR function. 
> 
> I would like to know how where can I check the code that is implemented for the mapper and the 
> reducer? is it in the CIMappper.class and CIReducer.class?
> 
> Is there a research paper or pseudo-code in which Mahout parallel K-means was based on?
> 
> Thank you so much and have a nice day.
> 
> Best regards
> 
> 
>> From: ted.dunning@gmail.com
>> Date: Sat, 15 Mar 2014 13:56:56 -0700
>> Subject: Re: Mahout parallel K-Means - algorithms analysis‏
>> To: user@mahout.apache.org
>> 
>> We would love to help.
>> 
>> Can you say which program and which classes you are looking at?
>> 
>> 
>> On Sat, Mar 15, 2014 at 12:58 PM, hiroshi leon <hi...@hotmail.com>wrote:
>> 
>>> To whom it may correspond,
>>> 
>>> Hello, I have been checking the algorithm of Mahout 0.9 version k-means
>>> using MapReduce and I would like to know where can I check the code of
>>> what is happening inside the map function and in the reducer?
>>> 
>>> 
>>> I was debugging using NetBeans and I was not able to find what is exactly
>>> implemented in the Map and Reduce functions...
>>> 
>>> 
>>> 
>>> The reason what I am doing this is because I would like to know what
>>> is exactly implemented in the version of Mahout 0.9 in order to see
>>> which parts where optimized on the K-Means mapReduce algorithm.
>>> 
>>> 
>>> 
>>> Do you know  which research paper the Mahout K-means was based on or where
>>> can I read the pseudo code?
>>> 
>>> 
>>> 
>>> Thank you so much!
>>> 
>>> 
>>> 
>>> Best regards!
>>> 
>>> Hiroshi
>