You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Grant Ingersoll <gs...@apache.org> on 2009/02/17 14:28:12 UTC

GSOC Time nearing

Just to let everyone know, GSOC (Google Summer of Code) time is  
nearing again.  Last year we had two really good students and lots of  
good proposals.  Would love to see that continue.  I'll post more info  
when I have it, but here's some starter info:

Google Site: http://code.google.com/soc/
Last year: http://wiki.apache.org/general/SummerOfCode2008
For ideas on what we need, see:  http://cwiki.apache.org/MAHOUT.  To  
name a few:  SVM, categorization algs, large scale graph ranking  
tools, maximum entropy implementation, collaborative filtering  
improvements (Sean?)

For existing committers, If you are interested in mentoring, let me  
know.

!!!!!!!!
For applicants, some things to keep in mind:

It's very important applicants demonstrate they are capable of working  
and discussing ideas on the mahout-dev list during the application  
phase.  It simply is not enough to throw up a proposal on the GSOC  
site, even a strong one, and expect to be selected.  The Apache Way is  
all about community.  We want to hear the ideas and we want to discuss  
them and we want you to be a part of the community.  If you want  
examples of that, see the archives from last year and our interactions  
with our two students from 2008.  Or, just look at any of the  
interactions on the lists.  Ask questions, help out, etc.   If you  
really want a leg up, demonstrate your proficiency, by creating a  
small patch/demo that fixes/improves something in the current  
implementations.  See the How To Contribute section of the Wiki.

Lastly, before I get off my soap box, when applying, DO NOT claim to  
be able to implement a whole slew of algorithms in one fell swoop.  I  
don't care how good you are (or think you are), it simply isn't  
possible.  Trust me.  Even if you could (and you can't), the community  
won't be able to keep up and then you won't be happy either.  Instead,  
pick one good idea and show a project timeline and a in-depth  
knowledge of what you are proposing, including references, etc.  If  
you really think you could do more than one, instead propose items  
that are "time permitting" and that build on what you have completed.   
Demos and documentation are always good in this regard.


Cheers,
Grant

Re: GSOC Time nearing

Posted by Isabel Drost <is...@apache.org>.
On Tuesday 17 February 2009, Grant Ingersoll wrote:
> For ideas on what we need, see:  http://cwiki.apache.org/MAHOUT.  To
> name a few:  SVM, categorization algs, large scale graph ranking
> tools, maximum entropy implementation, collaborative filtering
> improvements (Sean?)

To name a few more: Algorithms for learning from sequential data (e.g. 
identifying named entities in an incoming stream of text), algorithms for 
learning rankings of items are also interesting.

If you plan to use Mahout as your platform of one of the various data mining-, 
machine learning- or information retrieval challenges feel free to submit 
your plan as GSoC proposal.


> !!!!!!!!
> For applicants, some things to keep in mind:
>
> It's very important applicants demonstrate they are capable of working
> and discussing ideas on the mahout-dev list during the application
> phase.

A definite +1 from me. Discussing your idea before submitting the proposal 
also helps you to get an idea of what exactly is needed, what is important to 
keep in mind and to get a better proposal. So don't be afraid to post your 
idea and refine it together with us.


> If you really think you could do more than one, instead propose items
> that are "time permitting" and that build on what you have completed.
> Demos and documentation are always good in this regard.

+1 It is not sufficient to submit a straight forward algorithm implementation. 
Keep in mind that in order to remain maintainable you need to provide unit- 
and integration tests for your work. In addition you need to provide examples 
and demos so others can see how to use your work. Finally thorough 
documentation of the algorithm itself, the implementation, its advantages and 
limits is needed to evaluate it for commercial projects.

Isabel


-- 
The abuse of greatness is when it disjoins remorse from power.		-- William 
Shakespeare, "Julius Caesar"
  |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
  /,`.-'`'    -.  ;-;;,_
 |,4-  ) )-,_..;\ (  `'-'
'---''(_/--'  `-'\_) (fL)  IM:  <xm...@spaceboyz.net>

Re: GSOC Time nearing

Posted by Grant Ingersoll <gs...@apache.org>.
http://wiki.apache.org/general/SummerOfCode2009

I put a spot in for us and added mine and Ted's names.  Please feel  
free to add your name if you are interested in mentoring.


On Feb 17, 2009, at 5:37 PM, Ted Dunning wrote:

> I would love to help again.  And this time, my student will not fail !
> (not if I can reach them with a metaphorical two by four, that is)
>
> 2009/2/17 Grant Ingersoll <gs...@apache.org>
>
>> Just to let everyone know, GSOC (Google Summer of Code) time is  
>> nearing
>> again.  Last year we had two really good students and lots of good
>> proposals.  Would love to see that continue.  I'll post more info  
>> when I
>> have it, but here's some starter info:
>>
>> Google Site: http://code.google.com/soc/
>> Last year: http://wiki.apache.org/general/SummerOfCode2008
>> For ideas on what we need, see:  http://cwiki.apache.org/MAHOUT.   
>> To name
>> a few:  SVM, categorization algs, large scale graph ranking tools,  
>> maximum
>> entropy implementation, collaborative filtering improvements (Sean?)
>>
>> For existing committers, If you are interested in mentoring, let me  
>> know.
>>
>> !!!!!!!!
>> For applicants, some things to keep in mind:
>>
>> It's very important applicants demonstrate they are capable of  
>> working and
>> discussing ideas on the mahout-dev list during the application  
>> phase.  It
>> simply is not enough to throw up a proposal on the GSOC site, even  
>> a strong
>> one, and expect to be selected.  The Apache Way is all about  
>> community.  We
>> want to hear the ideas and we want to discuss them and we want you  
>> to be a
>> part of the community.  If you want examples of that, see the  
>> archives from
>> last year and our interactions with our two students from 2008.   
>> Or, just
>> look at any of the interactions on the lists.  Ask questions, help  
>> out, etc.
>>  If you really want a leg up, demonstrate your proficiency, by  
>> creating a
>> small patch/demo that fixes/improves something in the current
>> implementations.  See the How To Contribute section of the Wiki.
>>
>> Lastly, before I get off my soap box, when applying, DO NOT claim  
>> to be
>> able to implement a whole slew of algorithms in one fell swoop.  I  
>> don't
>> care how good you are (or think you are), it simply isn't  
>> possible.  Trust
>> me.  Even if you could (and you can't), the community won't be able  
>> to keep
>> up and then you won't be happy either.  Instead, pick one good idea  
>> and show
>> a project timeline and a in-depth knowledge of what you are  
>> proposing,
>> including references, etc.  If you really think you could do more  
>> than one,
>> instead propose items that are "time permitting" and that build on  
>> what you
>> have completed.  Demos and documentation are always good in this  
>> regard.
>>
>>
>> Cheers,
>> Grant
>>
>
>
>
> -- 
> Ted Dunning, CTO
> DeepDyve
> 4600 Bohannon Drive, Suite 220
> Menlo Park, CA 94025
> www.deepdyve.com
> 650-324-0110, ext. 738
> 858-414-0013 (m)

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search


Re: GSOC Time nearing

Posted by Ted Dunning <te...@gmail.com>.
I would love to help again.  And this time, my student will not fail !
(not if I can reach them with a metaphorical two by four, that is)

2009/2/17 Grant Ingersoll <gs...@apache.org>

> Just to let everyone know, GSOC (Google Summer of Code) time is nearing
> again.  Last year we had two really good students and lots of good
> proposals.  Would love to see that continue.  I'll post more info when I
> have it, but here's some starter info:
>
> Google Site: http://code.google.com/soc/
> Last year: http://wiki.apache.org/general/SummerOfCode2008
> For ideas on what we need, see:  http://cwiki.apache.org/MAHOUT.  To name
> a few:  SVM, categorization algs, large scale graph ranking tools, maximum
> entropy implementation, collaborative filtering improvements (Sean?)
>
> For existing committers, If you are interested in mentoring, let me know.
>
> !!!!!!!!
> For applicants, some things to keep in mind:
>
> It's very important applicants demonstrate they are capable of working and
> discussing ideas on the mahout-dev list during the application phase.  It
> simply is not enough to throw up a proposal on the GSOC site, even a strong
> one, and expect to be selected.  The Apache Way is all about community.  We
> want to hear the ideas and we want to discuss them and we want you to be a
> part of the community.  If you want examples of that, see the archives from
> last year and our interactions with our two students from 2008.  Or, just
> look at any of the interactions on the lists.  Ask questions, help out, etc.
>   If you really want a leg up, demonstrate your proficiency, by creating a
> small patch/demo that fixes/improves something in the current
> implementations.  See the How To Contribute section of the Wiki.
>
> Lastly, before I get off my soap box, when applying, DO NOT claim to be
> able to implement a whole slew of algorithms in one fell swoop.  I don't
> care how good you are (or think you are), it simply isn't possible.  Trust
> me.  Even if you could (and you can't), the community won't be able to keep
> up and then you won't be happy either.  Instead, pick one good idea and show
> a project timeline and a in-depth knowledge of what you are proposing,
> including references, etc.  If you really think you could do more than one,
> instead propose items that are "time permitting" and that build on what you
> have completed.  Demos and documentation are always good in this regard.
>
>
> Cheers,
> Grant
>



-- 
Ted Dunning, CTO
DeepDyve
4600 Bohannon Drive, Suite 220
Menlo Park, CA 94025
www.deepdyve.com
650-324-0110, ext. 738
858-414-0013 (m)