You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Khalil Honsali <k....@gmail.com> on 2008/03/19 11:18:26 UTC

ASF + Hadoop + Mahout x GSoC

Hello all,

I am writing you to inquire about the projects posted for Google Summer of
Code (GSoC).
Please consider this as step[0] of the process. If this mailist is not the
right context for this topic please disgard this message and kindly suggest
appropriate context.

First, let me introduce myself, I am master course student at Nitech.ac.jp;
I just finished my first year dedicated to courses and about to start the
research project. I have been around hadoop since 2007.end; though I define
myself as beginner user, to be honest, I have a basic understanding of
Hadoop architecture, how to use it, its API and map/reduce programs.
My actual research topic/interest is targetting efficient index distribution
for a digital library system; in which I wish to let hadoop digest citations
and generate indexes/metadata to be distributed around HDFS. Hence, I see in
GSoC as an exellent opportunity to practice with Hadoop, a good excuse to
benefit from experienced mentors and why not eventually output a
publication?
I understand that in case my participation is welcome, first I have to
discuss then submit to ASF::Hadoop a formal idea proposal before applying to
Google. Through the ASF GSoC page, one interestingly challenging project is
about Hadoop:Mahout's machine learning algorithms, I am more specifically
attracted to Neural Networks problem, as I enjoyed the one and only course I
had on the matter (we just had to make a simple BBP on a input sine wave
approximation); though I believe such a problem might fit more naturally
into Dryad's acyclic graph model than mapreduce....

My questions are:
= what are qualifications to join ASF's Hadoop GSoC ?
= in that case are there specific guidelines to follow?
= more importantly what are your suggestions concerning the above matters?
= and finally what's next from here?

Thank you very much in advance for your time.

Regards;

K. Honsali

Re: ASF + Hadoop + Mahout x GSoC

Posted by Grant Ingersoll <gs...@apache.org>.
On Mar 19, 2008, at 6:18 AM, Khalil Honsali wrote:

> Hello all,
>
> I am writing you to inquire about the projects posted for Google  
> Summer of
> Code (GSoC).
> Please consider this as step[0] of the process. If this mailist is  
> not the
> right context for this topic please disgard this message and kindly  
> suggest
> appropriate context.
>
> First, let me introduce myself, I am master course student at  
> Nitech.ac.jp;
> I just finished my first year dedicated to courses and about to  
> start the
> research project. I have been around hadoop since 2007.end; though I  
> define
> myself as beginner user, to be honest, I have a basic understanding of
> Hadoop architecture, how to use it, its API and map/reduce programs.
> My actual research topic/interest is targetting efficient index  
> distribution
> for a digital library system; in which I wish to let hadoop digest  
> citations
> and generate indexes/metadata to be distributed around HDFS. Hence,  
> I see in
> GSoC as an exellent opportunity to practice with Hadoop, a good  
> excuse to
> benefit from experienced mentors and why not eventually output a
> publication?
> I understand that in case my participation is welcome, first I have to
> discuss then submit to ASF::Hadoop a formal idea proposal before  
> applying to
> Google. Through the ASF GSoC page, one interestingly challenging  
> project is
> about Hadoop:Mahout's machine learning algorithms, I am more  
> specifically
> attracted to Neural Networks problem, as I enjoyed the one and only  
> course I
> had on the matter (we just had to make a simple BBP on a input sine  
> wave
> approximation); though I believe such a problem might fit more  
> naturally
> into Dryad's acyclic graph model than mapreduce....
>
> My questions are:
> = what are qualifications to join ASF's Hadoop GSoC ?
>
> = in that case are there specific guidelines to follow?
> = more importantly what are your suggestions concerning the above  
> matters?
> = and finally what's next from here?

I think most of the info is at http://wiki.apache.org/general/SummerOfCode2008 
  and the timeline at: http://code.google.com/opensource/gsoc/2008/faqs.html#0.1_timeline

FWIW, I think implementing a M/R Neural Nets would be great.   
Essentially, what you need to do is make a proposal.  I think you can  
look at prior year proposals to see how those look.  One thing to  
think about is how much time it will take to implement, so you may  
want to put together a progression of tasks, w/ some maybe even being  
optional depending on how the earlier ones go.

Cheers,
Grant