You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Alina Ciobanu <al...@yahoo.com.INVALID> on 2015/02/01 14:06:20 UTC

[Math] Contributions to the clustering module (maybe GSoC)

Hello everyone,
My name is Alina Ciobanu. I'm a first-year Ph.D. student in computer science (NLP) at the Faculty of Mathematics and Computer Science, University of Bucharest, Romania. I am interested in contributing to the Apache Commons Math library. My idea is to work on the clustering module, to implement spectral clustering, maybe also the mean shift algorithm, and some clustering validation methods. Would you please tell me if you think that such a contribution would be useful to the Commons Math users? If so, I will provide more details about what I have in mind. Any suggestions are welcome.
I am also thinking about applying to Google Summer of Code this year. I haven't decided yet because I am not sure, at this moment, if my schedule for this summer would allow it. Thus, this question is only in perspective: would anyone from the Commons Math community be interested in mentoring a GSoC project (on the clustering module, as described above, or on something related)?
Best regards,Alina Ciobanu

Re: [Math] Contributions to the clustering module (maybe GSoC)

Posted by Alina Ciobanu <al...@yahoo.com.INVALID>.
Thank you! I forked the repository on GitHub and I will start working on the clustering evaluation methods first.

Best regards,
Alina
      From: Thomas Neidhart <th...@gmail.com>
 To: Commons Developers List <de...@commons.apache.org> 
 Sent: Tuesday, February 10, 2015 11:37 PM
 Subject: Re: [Math] Contributions to the clustering module (maybe GSoC)
   
On 02/07/2015 09:53 PM, Alina Ciobanu wrote:
> Hello,
> 
> I finally figured out my schedule for this summer and the conclusion is that I would be able to dedicate about 20 hours per week for the GSoC project. As far as I understand, this is about half of what is expected from a GSoC student, so unfortunately I think I should not apply this year. I want to contribute to the Commons Math library nonetheless.

no problem.

Do not hesitate to ask questions on the mailinglist if you need some help.

You can find a code formatter for the commons-math style here:
http://people.apache.org/~luc/Apache-commons.xml

This style can be imported in eclipse (Java Code Style | Formatter).



Thomas


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org



   

Re: [Math] Contributions to the clustering module (maybe GSoC)

Posted by Thomas Neidhart <th...@gmail.com>.
On 02/07/2015 09:53 PM, Alina Ciobanu wrote:
> Hello,
> 
> I finally figured out my schedule for this summer and the conclusion is that I would be able to dedicate about 20 hours per week for the GSoC project. As far as I understand, this is about half of what is expected from a GSoC student, so unfortunately I think I should not apply this year. I want to contribute to the Commons Math library nonetheless.

no problem.

Do not hesitate to ask questions on the mailinglist if you need some help.

You can find a code formatter for the commons-math style here:
http://people.apache.org/~luc/Apache-commons.xml

This style can be imported in eclipse (Java Code Style | Formatter).

Thomas


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [Math] Contributions to the clustering module (maybe GSoC)

Posted by Phil Steitz <ph...@gmail.com>.
On 2/7/15 1:53 PM, Alina Ciobanu wrote:
> Hello,
>
> I finally figured out my schedule for this summer and the conclusion is that I would be able to dedicate about 20 hours per week for the GSoC project. As far as I understand, this is about half of what is expected from a GSoC student, so unfortunately I think I should not apply this year. I want to contribute to the Commons Math library nonetheless.

Patches / review / ideas are always welcome!

Phil
>
> Best regards,
> Alina
>       From: Thomas Neidhart <th...@gmail.com>
>  To: Commons Developers List <de...@commons.apache.org> 
>  Sent: Tuesday, February 3, 2015 1:17 AM
>  Subject: Re: [Math] Contributions to the clustering module (maybe GSoC)
>    
> On 02/02/2015 10:36 PM, Alina Ciobanu wrote:
>> Hello Thomas,
>>
>>
>> Thank you for the answer. I hope I will be able to clarify my schedule for the summer in about a week from now and I will decide whether I should apply to GSoC this year or not. I will let you know as soon as I can. Until then, I will shortly describe my first ideas below:
>>
>>
>> 1. Spectral clustering [1] - It basically maps the data in a lower-dimensional space (relying on the eigenvectors of the similarity matrix) and performs (k-means) clustering there. This method can resolve a wide variety of problems, regardless of the form of the clusters. It could be implemented efficiently using the Commons Math linear algebra module.
>>
>>
>> 2. Mean shift algorithm [2] - I didn't grasp all the details of the algorithm yet, but I find it very interesting. As far as I understand, it has been primarily used in pattern recognition and computer vision. I discovered it while searching for an algorithm that does not require the number of clusters as input parameter. I think it would be a good addition to Commons Math besides DBSCAN, from this point of view.
>>
>>
>> 3. Clustering evaluation methods3.1. The Silhouette Coefficient [3] - accounts for the intra-cluster and inter-cluster distance to assign a score in [-1, 1] to a clustering.3.2. External clustering evaluation [4] - when gold standard is available for the clustered data, it can be used to asses the performance of a clustering algorithm.
>>
>>
>> Suggestions are more than welcome. If you have requests from users for specific clustering algorithms, please let me know.
> You proposals sound good, as a pointer to already existing feature
> requests you can take a look at:
>
>  * Optics algorithm - https://issues.apache.org/jira/browse/MATH-1190
>  * HAC algorithm - https://issues.apache.org/jira/browse/MATH-959
>
> Cluster evaluation would also be very interesting, I already wanted to
> do something in this direction but could not find the time.
>
> btw. by coincidence, we received a reminder about this years GSOC just
> today, the deadline is 13-02-2015 to submit a project proposal with
> project ideas.
>
>
>
> Thomas
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>
>
>    


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [Math] Contributions to the clustering module (maybe GSoC)

Posted by Alina Ciobanu <al...@yahoo.com.INVALID>.
Hello,

I finally figured out my schedule for this summer and the conclusion is that I would be able to dedicate about 20 hours per week for the GSoC project. As far as I understand, this is about half of what is expected from a GSoC student, so unfortunately I think I should not apply this year. I want to contribute to the Commons Math library nonetheless.

Best regards,
Alina
      From: Thomas Neidhart <th...@gmail.com>
 To: Commons Developers List <de...@commons.apache.org> 
 Sent: Tuesday, February 3, 2015 1:17 AM
 Subject: Re: [Math] Contributions to the clustering module (maybe GSoC)
   
On 02/02/2015 10:36 PM, Alina Ciobanu wrote:
> Hello Thomas,
> 
> 
> Thank you for the answer. I hope I will be able to clarify my schedule for the summer in about a week from now and I will decide whether I should apply to GSoC this year or not. I will let you know as soon as I can. Until then, I will shortly describe my first ideas below:
> 
> 
> 1. Spectral clustering [1] - It basically maps the data in a lower-dimensional space (relying on the eigenvectors of the similarity matrix) and performs (k-means) clustering there. This method can resolve a wide variety of problems, regardless of the form of the clusters. It could be implemented efficiently using the Commons Math linear algebra module.
> 
> 
> 2. Mean shift algorithm [2] - I didn't grasp all the details of the algorithm yet, but I find it very interesting. As far as I understand, it has been primarily used in pattern recognition and computer vision. I discovered it while searching for an algorithm that does not require the number of clusters as input parameter. I think it would be a good addition to Commons Math besides DBSCAN, from this point of view.
> 
> 
> 3. Clustering evaluation methods3.1. The Silhouette Coefficient [3] - accounts for the intra-cluster and inter-cluster distance to assign a score in [-1, 1] to a clustering.3.2. External clustering evaluation [4] - when gold standard is available for the clustered data, it can be used to asses the performance of a clustering algorithm.
> 
> 
> Suggestions are more than welcome. If you have requests from users for specific clustering algorithms, please let me know.

You proposals sound good, as a pointer to already existing feature
requests you can take a look at:

 * Optics algorithm - https://issues.apache.org/jira/browse/MATH-1190
 * HAC algorithm - https://issues.apache.org/jira/browse/MATH-959

Cluster evaluation would also be very interesting, I already wanted to
do something in this direction but could not find the time.

btw. by coincidence, we received a reminder about this years GSOC just
today, the deadline is 13-02-2015 to submit a project proposal with
project ideas.



Thomas


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org



   

Re: [Math] Contributions to the clustering module (maybe GSoC)

Posted by Thomas Neidhart <th...@gmail.com>.
On 02/02/2015 10:36 PM, Alina Ciobanu wrote:
> Hello Thomas,
> 
> 
> Thank you for the answer. I hope I will be able to clarify my schedule for the summer in about a week from now and I will decide whether I should apply to GSoC this year or not. I will let you know as soon as I can. Until then, I will shortly describe my first ideas below:
> 
> 
> 1. Spectral clustering [1] - It basically maps the data in a lower-dimensional space (relying on the eigenvectors of the similarity matrix) and performs (k-means) clustering there. This method can resolve a wide variety of problems, regardless of the form of the clusters. It could be implemented efficiently using the Commons Math linear algebra module.
> 
> 
> 2. Mean shift algorithm [2] - I didn't grasp all the details of the algorithm yet, but I find it very interesting. As far as I understand, it has been primarily used in pattern recognition and computer vision. I discovered it while searching for an algorithm that does not require the number of clusters as input parameter. I think it would be a good addition to Commons Math besides DBSCAN, from this point of view.
> 
> 
> 3. Clustering evaluation methods3.1. The Silhouette Coefficient [3] - accounts for the intra-cluster and inter-cluster distance to assign a score in [-1, 1] to a clustering.3.2. External clustering evaluation [4] - when gold standard is available for the clustered data, it can be used to asses the performance of a clustering algorithm.
> 
> 
> Suggestions are more than welcome. If you have requests from users for specific clustering algorithms, please let me know.

You proposals sound good, as a pointer to already existing feature
requests you can take a look at:

 * Optics algorithm - https://issues.apache.org/jira/browse/MATH-1190
 * HAC algorithm - https://issues.apache.org/jira/browse/MATH-959

Cluster evaluation would also be very interesting, I already wanted to
do something in this direction but could not find the time.

btw. by coincidence, we received a reminder about this years GSOC just
today, the deadline is 13-02-2015 to submit a project proposal with
project ideas.

Thomas


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [Math] Contributions to the clustering module (maybe GSoC)

Posted by Alina Ciobanu <al...@yahoo.com.INVALID>.
Hello Thomas,


Thank you for the answer. I hope I will be able to clarify my schedule for the summer in about a week from now and I will decide whether I should apply to GSoC this year or not. I will let you know as soon as I can. Until then, I will shortly describe my first ideas below:


1. Spectral clustering [1] - It basically maps the data in a lower-dimensional space (relying on the eigenvectors of the similarity matrix) and performs (k-means) clustering there. This method can resolve a wide variety of problems, regardless of the form of the clusters. It could be implemented efficiently using the Commons Math linear algebra module.


2. Mean shift algorithm [2] - I didn't grasp all the details of the algorithm yet, but I find it very interesting. As far as I understand, it has been primarily used in pattern recognition and computer vision. I discovered it while searching for an algorithm that does not require the number of clusters as input parameter. I think it would be a good addition to Commons Math besides DBSCAN, from this point of view.


3. Clustering evaluation methods3.1. The Silhouette Coefficient [3] - accounts for the intra-cluster and inter-cluster distance to assign a score in [-1, 1] to a clustering.3.2. External clustering evaluation [4] - when gold standard is available for the clustered data, it can be used to asses the performance of a clustering algorithm.


Suggestions are more than welcome. If you have requests from users for specific clustering algorithms, please let me know.


Best regards,Alina


[1] http://www.informatik.uni-hamburg.de/ML/contents/people/luxburg/publications/Luxburg07_tutorial.pdf[2] http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=1055330&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1055330[3] http://www.sciencedirect.com/science/article/pii/0377042787901257[4] http://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-clustering-1.html  
      From: Thomas Neidhart <th...@gmail.com>
 To: Commons Developers List <de...@commons.apache.org> 
 Sent: Sunday, February 1, 2015 8:33 PM
 Subject: Re: [Math] Contributions to the clustering module (maybe GSoC)
   
On 02/01/2015 02:06 PM, Alina Ciobanu wrote:


> Hello everyone,
> My name is Alina Ciobanu. I'm a first-year Ph.D. student in computer science (NLP) at the Faculty of Mathematics and Computer Science, University of Bucharest, Romania. I am interested in contributing to the Apache Commons Math library. My idea is to work on the clustering module, to implement spectral clustering, maybe also the mean shift algorithm, and some clustering validation methods. Would you please tell me if you think that such a contribution would be useful to the Commons Math users? If so, I will provide more details about what I have in mind. Any suggestions are welcome.
> I am also thinking about applying to Google Summer of Code this year. I haven't decided yet because I am not sure, at this moment, if my schedule for this summer would allow it. Thus, this question is only in perspective: would anyone from the Commons Math community be interested in mentoring a GSoC project (on the clustering module, as described above, or on something related)?
> Best regards,Alina Ciobanu

Hi Alina,

good to hear about your interest on commons-math. New contributions are
very welcome, and we have indeed several feature requests to add new
clustering algorithms.

I am certainly interested in mentoring you for GSOC, but there are maybe
also others that can help with that here.

Just let us know what you want to do early on so that we can prepare
ourselves.

Thomas

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org



  

Re: [Math] Contributions to the clustering module (maybe GSoC)

Posted by Thomas Neidhart <th...@gmail.com>.
On 02/01/2015 02:06 PM, Alina Ciobanu wrote:
> Hello everyone,
> My name is Alina Ciobanu. I'm a first-year Ph.D. student in computer science (NLP) at the Faculty of Mathematics and Computer Science, University of Bucharest, Romania. I am interested in contributing to the Apache Commons Math library. My idea is to work on the clustering module, to implement spectral clustering, maybe also the mean shift algorithm, and some clustering validation methods. Would you please tell me if you think that such a contribution would be useful to the Commons Math users? If so, I will provide more details about what I have in mind. Any suggestions are welcome.
> I am also thinking about applying to Google Summer of Code this year. I haven't decided yet because I am not sure, at this moment, if my schedule for this summer would allow it. Thus, this question is only in perspective: would anyone from the Commons Math community be interested in mentoring a GSoC project (on the clustering module, as described above, or on something related)?
> Best regards,Alina Ciobanu

Hi Alina,

good to hear about your interest on commons-math. New contributions are
very welcome, and we have indeed several feature requests to add new
clustering algorithms.

I am certainly interested in mentoring you for GSOC, but there are maybe
also others that can help with that here.

Just let us know what you want to do early on so that we can prepare
ourselves.

Thomas

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org