You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Yexi (JIRA)" <ji...@apache.org> on 2013/04/25 17:34:16 UTC

[jira] [Comment Edited] (MAHOUT-1177) GSOC 2013: Reform and simplify the clustering APIs

    [ https://issues.apache.org/jira/browse/MAHOUT-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13641893#comment-13641893 ] 

Yexi edited comment on MAHOUT-1177 at 4/25/13 3:34 PM:
-------------------------------------------------------

Hi, 

I am a graduate student majored in data mining, I am very interested in this project.
I have used some experiences on distributed data mining using hadoop, so I believe I can handle this project.

In order to work on this project, is it necessary for me to join the GSOC program?
As the GSOC requires the international student who studies in the US to apply for the CPT, and I almost used up the CPT due to previous internships, so I am not be able to apply CPT for the GSOC. 

Regards,
Yexi
                
      was (Author: yxjiang):
    Hi, 

I am a graduate student majored in data mining, I am very interested in this project.
I have used some experiences on distributed data mining using hadoop, so I believe I can handle this project.

In order to work on this project, is it necessary for me to join the GSOC program?
As the GSOC requires the international student who studies in the US to apply for the CPT, and I almost used up the CPT due to previous internships, so I am not be able to apply CPT for the GSOC. 
                  
> GSOC 2013: Reform and simplify the clustering APIs
> --------------------------------------------------
>
>                 Key: MAHOUT-1177
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1177
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: Dan Filimon
>              Labels: gsoc2013, mentor
>
> Clustering is one of the most used features in Mahout and has many applications [http://en.wikipedia.org/wiki/Cluster_analysis#Applications].
> We have of lots clustering algorithms. There's:
> - basic k-means
> - canopy clustering
> - Dirichlet clustering
> - Fuzzy k-means
> - Spectral k-means
> - Streaming k-means [coming soon]
> We want to make them easier to use by updating the APIs and make sure they all work in the same way have consistent inputs, outputs, diagnostics and documentation.
> This is a great way to gain an in-depth understanding of clustering algorithms, familiarize yourself with Hadoop, Mahout clustering and good software engineering principles.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira