You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Weishung Chung <we...@gmail.com> on 2014/03/17 21:05:01 UTC

Re: Mahout parallel K-Means - algorithms analysis

You could take a look
at org.apache.mahout.clustering.classify/ClusterClassificationMapper

Enjoy,
Wei Shung


On Sat, Mar 15, 2014 at 2:51 PM, Suneel Marthi <su...@yahoo.com>wrote:

> The clustering code is cimapper and cireducer.  Following the clustering,
> there is cluster classification which is mapper only.
>
> Not sure about the reference paper, this stuffs been around for long but
> the documentation for kmeans on mahout.apache.org should explain the
> approach.
>
> Sent from my iPhone
>
> > On Mar 15, 2014, at 5:36 PM, hiroshi leon <hi...@hotmail.com>
> wrote:
> >
> > Hello Ted,
> >
> > Thank you so much for your reply, the program that I was checking is the
> KMeansDriver class with the run function,
> > the buildCluster function in the same class and following the
> ClusterIterator class with
> > the iterateMR function.
> >
> > I would like to know how where can I check the code that is implemented
> for the mapper and the
> > reducer? is it in the CIMappper.class and CIReducer.class?
> >
> > Is there a research paper or pseudo-code in which Mahout parallel
> K-means was based on?
> >
> > Thank you so much and have a nice day.
> >
> > Best regards
> >
> >
> >> From: ted.dunning@gmail.com
> >> Date: Sat, 15 Mar 2014 13:56:56 -0700
> >> Subject: Re: Mahout parallel K-Means - algorithms analysis
> >> To: user@mahout.apache.org
> >>
> >> We would love to help.
> >>
> >> Can you say which program and which classes you are looking at?
> >>
> >>
> >> On Sat, Mar 15, 2014 at 12:58 PM, hiroshi leon <
> hiroshi_8712@hotmail.com>wrote:
> >>
> >>> To whom it may correspond,
> >>>
> >>> Hello, I have been checking the algorithm of Mahout 0.9 version k-means
> >>> using MapReduce and I would like to know where can I check the code of
> >>> what is happening inside the map function and in the reducer?
> >>>
> >>>
> >>> I was debugging using NetBeans and I was not able to find what is
> exactly
> >>> implemented in the Map and Reduce functions...
> >>>
> >>>
> >>>
> >>> The reason what I am doing this is because I would like to know what
> >>> is exactly implemented in the version of Mahout 0.9 in order to see
> >>> which parts where optimized on the K-Means mapReduce algorithm.
> >>>
> >>>
> >>>
> >>> Do you know  which research paper the Mahout K-means was based on or
> where
> >>> can I read the pseudo code?
> >>>
> >>>
> >>>
> >>> Thank you so much!
> >>>
> >>>
> >>>
> >>> Best regards!
> >>>
> >>> Hiroshi
> >
>

RE: Mahout parallel K-Means - algorithms analysis

Posted by hiroshi leon <hi...@hotmail.com>.
Thanks Suneel,

Can someone please explain me a litlte bit about the ClusteringPolicy and the clusterClassifier?
and what are the benefits when using it with parallel K-Means?

Thank you so much,

Best regards.

> Date: Tue, 18 Mar 2014 04:35:14 -0700
> From: suneel_marthi@yahoo.com
> Subject: Re: Mahout parallel K-Means - algorithms analysis
> To: user@mahout.apache.org
> 
> Canopy and KMeans run independently and do not call eachother. 
> 
> For KMEans, the K value has to be specified when invoking KMeans.
> 
> Typically u run Canopy first and then invoke KMeans with the appropriate K-value as inferred from Canopy.
> 
> 
> 
> 
> 
> 
> 
> On Tuesday, March 18, 2014 4:33 AM, hiroshi leon <hi...@hotmail.com> wrote:
>  
> Thank you Wei and Suneel, 
> 
> By the way, does somebody know if the Parallel K-means of Mahout is using 
> Cannopy clustering at the beginning to generate the initial K in the K-Means driver class?
> 
> Best regards,
> 
> Hiroshi
> 
> > Date: Mon, 17 Mar 2014 13:05:01 -0700
> > Subject: Re: Mahout parallel K-Means - algorithms analysis
> > From: weishung@gmail.com
> > To: user@mahout.apache.org
> > CC: ted.dunning@gmail.com
> > 
> > You could take a look
> > at org.apache.mahout.clustering.classify/ClusterClassificationMapper
> > 
> > Enjoy,
> > Wei Shung
> > 
> > 
> > On Sat, Mar 15, 2014 at 2:51 PM, Suneel Marthi <su...@yahoo.com>wrote:
> > 
> > > The clustering code is cimapper and cireducer.  Following the clustering,
> > > there is cluster classification which is mapper only.
> > >
> > > Not sure about the reference paper, this stuffs been around for long but
> > > the documentation for kmeans on mahout.apache.org should explain the
> > > approach.
> > >
> > > Sent from my iPhone
> > >
> > > > On Mar 15, 2014, at 5:36 PM, hiroshi leon <hi...@hotmail.com>
> > > wrote:
> > > >
> > > > Hello Ted,
> > > >
> > > > Thank you so much for your reply, the program that I was checking is the
> > > KMeansDriver class with the run function,
> > > > the buildCluster function in the same class and following the
> > > ClusterIterator class with
> > > > the iterateMR function.
> > > >
> > > > I would like to know how where can I check the code that is implemented
> > > for the mapper and the
> > > > reducer? is it in the CIMappper.class and CIReducer.class?
> > > >
> > > > Is there a research paper or pseudo-code in which Mahout parallel
> > > K-means was based on?
> > > >
> > > > Thank you so much and have a nice day.
> > > >
> > > > Best regards
> > > >
> > > >
> > > >> From: ted.dunning@gmail.com
> > > >> Date: Sat, 15 Mar 2014 13:56:56 -0700
> > > >> Subject: Re: Mahout parallel K-Means - algorithms analysis
> > > >> To: user@mahout.apache.org
> > > >>
> > > >> We would love to help.
> > > >>
> > > >> Can you say which program and which classes you are looking at?
> > > >>
> > > >>
> > > >> On Sat, Mar 15, 2014 at 12:58 PM, hiroshi leon <
> > > hiroshi_8712@hotmail.com>wrote:
> > > >>
> > > >>> To whom it may correspond,
> > > >>>
> > > >>> Hello, I have been checking the algorithm of Mahout 0.9 version k-means
> > > >>> using MapReduce and I would like to know where can I check the code of
> > > >>> what is happening inside the map function and in the reducer?
> > > >>>
> > > >>>
> > > >>> I was debugging using NetBeans and I was not able to find what is
> > > exactly
> > > >>> implemented in the Map and Reduce functions...
> > > >>>
> > > >>>
> > > >>>
> > > >>> The reason what I am doing this is because I would like to know what
> > > >>> is exactly implemented in the version of Mahout 0.9 in order to see
> > > >>> which parts where optimized on the K-Means mapReduce algorithm.
> > > >>>
> > > >>>
> > > >>>
> > > >>> Do you know  which research paper the Mahout K-means was based on or
> > > where
> > > >>> can I read the pseudo code?
> > > >>>
> > > >>>
> > > >>>
> > > >>> Thank you so much!
> > > >>>
> > > >>>
> > > >>>
> > > >>> Best regards!
> > > >>>
> > > >>> Hiroshi
> > > >
> > >
 		 	   		  

Re: Mahout parallel K-Means - algorithms analysis

Posted by Suneel Marthi <su...@yahoo.com>.
Canopy and KMeans run independently and do not call eachother. 

For KMEans, the K value has to be specified when invoking KMeans.

Typically u run Canopy first and then invoke KMeans with the appropriate K-value as inferred from Canopy.







On Tuesday, March 18, 2014 4:33 AM, hiroshi leon <hi...@hotmail.com> wrote:
 
Thank you Wei and Suneel, 

By the way, does somebody know if the Parallel K-means of Mahout is using 
Cannopy clustering at the beginning to generate the initial K in the K-Means driver class?

Best regards,

Hiroshi

> Date: Mon, 17 Mar 2014 13:05:01 -0700
> Subject: Re: Mahout parallel K-Means - algorithms analysis
> From: weishung@gmail.com
> To: user@mahout.apache.org
> CC: ted.dunning@gmail.com
> 
> You could take a look
> at org.apache.mahout.clustering.classify/ClusterClassificationMapper
> 
> Enjoy,
> Wei Shung
> 
> 
> On Sat, Mar 15, 2014 at 2:51 PM, Suneel Marthi <su...@yahoo.com>wrote:
> 
> > The clustering code is cimapper and cireducer.  Following the clustering,
> > there is cluster classification which is mapper only.
> >
> > Not sure about the reference paper, this stuffs been around for long but
> > the documentation for kmeans on mahout.apache.org should explain the
> > approach.
> >
> > Sent from my iPhone
> >
> > > On Mar 15, 2014, at 5:36 PM, hiroshi leon <hi...@hotmail.com>
> > wrote:
> > >
> > > Hello Ted,
> > >
> > > Thank you so much for your reply, the program that I was checking is the
> > KMeansDriver class with the run function,
> > > the buildCluster function in the same class and following the
> > ClusterIterator class with
> > > the iterateMR function.
> > >
> > > I would like to know how where can I check the code that is implemented
> > for the mapper and the
> > > reducer? is it in the CIMappper.class and CIReducer.class?
> > >
> > > Is there a research paper or pseudo-code in which Mahout parallel
> > K-means was based on?
> > >
> > > Thank you so much and have a nice day.
> > >
> > > Best regards
> > >
> > >
> > >> From: ted.dunning@gmail.com
> > >> Date: Sat, 15 Mar 2014 13:56:56 -0700
> > >> Subject: Re: Mahout parallel K-Means - algorithms analysis
> > >> To: user@mahout.apache.org
> > >>
> > >> We would love to help.
> > >>
> > >> Can you say which program and which classes you are looking at?
> > >>
> > >>
> > >> On Sat, Mar 15, 2014 at 12:58 PM, hiroshi leon <
> > hiroshi_8712@hotmail.com>wrote:
> > >>
> > >>> To whom it may correspond,
> > >>>
> > >>> Hello, I have been checking the algorithm of Mahout 0.9 version k-means
> > >>> using MapReduce and I would like to know where can I check the code of
> > >>> what is happening inside the map function and in the reducer?
> > >>>
> > >>>
> > >>> I was debugging using NetBeans and I was not able to find what is
> > exactly
> > >>> implemented in the Map and Reduce functions...
> > >>>
> > >>>
> > >>>
> > >>> The reason what I am doing this is because I would like to know what
> > >>> is exactly implemented in the version of Mahout 0.9 in order to see
> > >>> which parts where optimized on the K-Means mapReduce algorithm.
> > >>>
> > >>>
> > >>>
> > >>> Do you know  which research paper the Mahout K-means was based on or
> > where
> > >>> can I read the pseudo code?
> > >>>
> > >>>
> > >>>
> > >>> Thank you so much!
> > >>>
> > >>>
> > >>>
> > >>> Best regards!
> > >>>
> > >>> Hiroshi
> > >
> >

RE: Mahout parallel K-Means - algorithms analysis

Posted by hiroshi leon <hi...@hotmail.com>.
Thank you Wei and Suneel, 

By the way, does somebody know if the Parallel K-means of Mahout is using 
Cannopy clustering at the beginning to generate the initial K in the K-Means driver class?

Best regards,

Hiroshi

> Date: Mon, 17 Mar 2014 13:05:01 -0700
> Subject: Re: Mahout parallel K-Means - algorithms analysis
> From: weishung@gmail.com
> To: user@mahout.apache.org
> CC: ted.dunning@gmail.com
> 
> You could take a look
> at org.apache.mahout.clustering.classify/ClusterClassificationMapper
> 
> Enjoy,
> Wei Shung
> 
> 
> On Sat, Mar 15, 2014 at 2:51 PM, Suneel Marthi <su...@yahoo.com>wrote:
> 
> > The clustering code is cimapper and cireducer.  Following the clustering,
> > there is cluster classification which is mapper only.
> >
> > Not sure about the reference paper, this stuffs been around for long but
> > the documentation for kmeans on mahout.apache.org should explain the
> > approach.
> >
> > Sent from my iPhone
> >
> > > On Mar 15, 2014, at 5:36 PM, hiroshi leon <hi...@hotmail.com>
> > wrote:
> > >
> > > Hello Ted,
> > >
> > > Thank you so much for your reply, the program that I was checking is the
> > KMeansDriver class with the run function,
> > > the buildCluster function in the same class and following the
> > ClusterIterator class with
> > > the iterateMR function.
> > >
> > > I would like to know how where can I check the code that is implemented
> > for the mapper and the
> > > reducer? is it in the CIMappper.class and CIReducer.class?
> > >
> > > Is there a research paper or pseudo-code in which Mahout parallel
> > K-means was based on?
> > >
> > > Thank you so much and have a nice day.
> > >
> > > Best regards
> > >
> > >
> > >> From: ted.dunning@gmail.com
> > >> Date: Sat, 15 Mar 2014 13:56:56 -0700
> > >> Subject: Re: Mahout parallel K-Means - algorithms analysis
> > >> To: user@mahout.apache.org
> > >>
> > >> We would love to help.
> > >>
> > >> Can you say which program and which classes you are looking at?
> > >>
> > >>
> > >> On Sat, Mar 15, 2014 at 12:58 PM, hiroshi leon <
> > hiroshi_8712@hotmail.com>wrote:
> > >>
> > >>> To whom it may correspond,
> > >>>
> > >>> Hello, I have been checking the algorithm of Mahout 0.9 version k-means
> > >>> using MapReduce and I would like to know where can I check the code of
> > >>> what is happening inside the map function and in the reducer?
> > >>>
> > >>>
> > >>> I was debugging using NetBeans and I was not able to find what is
> > exactly
> > >>> implemented in the Map and Reduce functions...
> > >>>
> > >>>
> > >>>
> > >>> The reason what I am doing this is because I would like to know what
> > >>> is exactly implemented in the version of Mahout 0.9 in order to see
> > >>> which parts where optimized on the K-Means mapReduce algorithm.
> > >>>
> > >>>
> > >>>
> > >>> Do you know  which research paper the Mahout K-means was based on or
> > where
> > >>> can I read the pseudo code?
> > >>>
> > >>>
> > >>>
> > >>> Thank you so much!
> > >>>
> > >>>
> > >>>
> > >>> Best regards!
> > >>>
> > >>> Hiroshi
> > >
> >