You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Robin Anil <ro...@gmail.com> on 2010/01/27 05:50:55 UTC

GSOC 2010 is here

Greetings! Fellow GSOC alums, administrators and dear mentors, the next
edition is right here. Details are given in the link below.

https://groups.google.com/group/google-summer-of-code-discuss/browse_thread/thread/d839c0b02ac15b3f

Maybe we could identify key areas in Mahout which we need to develop apart
from the ML implementations and list it down for students to see before they
start trickling in.

Some ideas:
Benchmarking Framework with EC2 wrappers
Commandline Console+Launcher like Hbase and hadoop
Online Tool/Query UI for Algorithms in Mahout(like CF)


Possible ideas(I have no idea what i am talking here but there are nice
problems to solve)
Improvements in Math?
How to tackle management of datasets?
Error Recovery if a job fails?


Robin

Re: GSOC 2010 is here

Posted by Isabel Drost <is...@apache.org>.
On Wed Robin Anil <ro...@gmail.com> wrote:
> On Wed, Jan 27, 2010 at 8:10 PM, Grant Ingersoll
> <gs...@apache.org>wrote:
> 
> > Let's not forget implementations of ML algorithms!  I think that is
> > one of the star attractions for working with Mahout and also makes
> > for a nice project for the student.

And in some cases even the mentoring researcher mentoring the student.


> Yes! :) That goes without saying

Maybe we can help students come up with great ideas for new algorithms
by pointing out areas that are still open or under-represented in
Mahout? 


Isabel

Re: GSOC 2010 is here

Posted by Robin Anil <ro...@gmail.com>.
Yes! :) That goes without saying

On Wed, Jan 27, 2010 at 8:10 PM, Grant Ingersoll <gs...@apache.org>wrote:

> Let's not forget implementations of ML algorithms!  I think that is one of
> the star attractions for working with Mahout and also makes for a nice
> project for the student.
>
> On Jan 26, 2010, at 11:50 PM, Robin Anil wrote:
>
> > Greetings! Fellow GSOC alums, administrators and dear mentors, the next
> > edition is right here. Details are given in the link below.
> >
> >
> https://groups.google.com/group/google-summer-of-code-discuss/browse_thread/thread/d839c0b02ac15b3f
> >
> > Maybe we could identify key areas in Mahout which we need to develop
> apart
> > from the ML implementations and list it down for students to see before
> they
> > start trickling in.
> >
> > Some ideas:
> > Benchmarking Framework with EC2 wrappers
> > Commandline Console+Launcher like Hbase and hadoop
> > Online Tool/Query UI for Algorithms in Mahout(like CF)
> >
> >
> > Possible ideas(I have no idea what i am talking here but there are nice
> > problems to solve)
> > Improvements in Math?
> > How to tackle management of datasets?
> > Error Recovery if a job fails?
> >
> >
> > Robin
>
>


-- 
------
Robin Anil
Blog: http://techdigger.wordpress.com
-------
Try out Swipeball for iPhone
Video: http://www.youtube.com/watch?v=3hvEbWHciwU
iTunes: http://itunes.com/apps/swipeball

Re: GSOC 2010 is here

Posted by Grant Ingersoll <gs...@apache.org>.
Let's not forget implementations of ML algorithms!  I think that is one of the star attractions for working with Mahout and also makes for a nice project for the student.

On Jan 26, 2010, at 11:50 PM, Robin Anil wrote:

> Greetings! Fellow GSOC alums, administrators and dear mentors, the next
> edition is right here. Details are given in the link below.
> 
> https://groups.google.com/group/google-summer-of-code-discuss/browse_thread/thread/d839c0b02ac15b3f
> 
> Maybe we could identify key areas in Mahout which we need to develop apart
> from the ML implementations and list it down for students to see before they
> start trickling in.
> 
> Some ideas:
> Benchmarking Framework with EC2 wrappers
> Commandline Console+Launcher like Hbase and hadoop
> Online Tool/Query UI for Algorithms in Mahout(like CF)
> 
> 
> Possible ideas(I have no idea what i am talking here but there are nice
> problems to solve)
> Improvements in Math?
> How to tackle management of datasets?
> Error Recovery if a job fails?
> 
> 
> Robin


Re: GSOC 2010 is here

Posted by Isabel Drost <is...@apache.org>.
On Mon Robin Anil <ro...@gmail.com> wrote:
> 2. UIMA Integration with Mahout? (Maybe a good project if UIMA folks
> are taking in GSOC students)

I guess one could easily split this one in two:

a) Using UIMA (whole pipeline or just the analysers if that is possible)
for data pre-processing before Mahout algorithms are run.

b) Making it easy to integrate Mahout algorithms (classification models
etc.) as UIMA annotators.

Isabel

Re: GSOC 2010 is here

Posted by Robin Anil <ro...@gmail.com>.
Some more Wild and Wacky Ideas. Might be out of scope for GSOC, but are nice
to have features for mahout. I would like to encourage all of you to put
down your ideas here.

1. Data Visualization tool backed with HDFS/Hbase for inspecting clusters,
Topic model etc etc
  - It could have many map/reduce jobs which transform the clustering
output, aggregates things and produce interesting stats or visualization of
data
2. UIMA Integration with Mahout? (Maybe a good project if UIMA folks are
taking in GSOC students)



Robin




On Mon, Feb 1, 2010 at 6:17 PM, Isabel Drost <is...@apache.org> wrote:

> On Wed Robin Anil <ro...@gmail.com> wrote:
> > Greetings! Fellow GSOC alums, administrators and dear mentors, the
> > next edition is right here. Details are given in the link below.
> >
> >
> https://groups.google.com/group/google-summer-of-code-discuss/browse_thread/thread/d839c0b02ac15b3f
>
> Some additional notes to committers:
>
> First of all mentoring a GSoC student is a great experience, so if
> you do have some cycles left, I would highly recommend participating in
> GSoC as a mentor (thanks Grant for convincing myself last year...).
>
> We had several successful students here at Mahout in past GSoC years.
> Each year there were strong proposals for projects within Mahout. As a
> results projects usually turn out to be interesting for both, mentor
> and student.
>
> One final note: If there is anyone on this list who might be interested
> in helping with general ASF GSoC logistics and administration tasks,
> please have a look at the newly founded community development project
> (dev@community.apache.org)
>
>
> > Maybe we could identify key areas in Mahout which we need to develop
> > apart from the ML implementations and list it down for students to
> > see before they start trickling in.
>
> And motivate students to come up with their own ideas and discuss them
> on-list before submitting their submission.
>
>
> > Some ideas:
> > Benchmarking Framework with EC2 wrappers
>
> +1 I would love to see that.
>
>
> > Commandline Console+Launcher like Hbase and hadoop
>
> +1
>
>
> > Online Tool/Query UI for Algorithms in Mahout(like CF)
> >
> >
> > Possible ideas(I have no idea what i am talking here but there are
> > nice problems to solve)
> > Improvements in Math?
> > How to tackle management of datasets?
> > Error Recovery if a job fails?
>
> How to tackle managment of learned classification models?
>
> Better tooling for Mahout integration? (Lucene for tokenization and
> analysers?, data import and export?)
>
>
>
> Isabel
>

Re: GSOC 2010 is here

Posted by Isabel Drost <is...@apache.org>.
On Wed Robin Anil <ro...@gmail.com> wrote:
> Greetings! Fellow GSOC alums, administrators and dear mentors, the
> next edition is right here. Details are given in the link below.
> 
> https://groups.google.com/group/google-summer-of-code-discuss/browse_thread/thread/d839c0b02ac15b3f

Some additional notes to committers: 

First of all mentoring a GSoC student is a great experience, so if
you do have some cycles left, I would highly recommend participating in
GSoC as a mentor (thanks Grant for convincing myself last year...).

We had several successful students here at Mahout in past GSoC years.
Each year there were strong proposals for projects within Mahout. As a
results projects usually turn out to be interesting for both, mentor
and student.

One final note: If there is anyone on this list who might be interested
in helping with general ASF GSoC logistics and administration tasks,
please have a look at the newly founded community development project
(dev@community.apache.org)

 
> Maybe we could identify key areas in Mahout which we need to develop
> apart from the ML implementations and list it down for students to
> see before they start trickling in.

And motivate students to come up with their own ideas and discuss them
on-list before submitting their submission.


> Some ideas:
> Benchmarking Framework with EC2 wrappers

+1 I would love to see that.


> Commandline Console+Launcher like Hbase and hadoop

+1


> Online Tool/Query UI for Algorithms in Mahout(like CF)
> 
> 
> Possible ideas(I have no idea what i am talking here but there are
> nice problems to solve)
> Improvements in Math?
> How to tackle management of datasets?
> Error Recovery if a job fails?

How to tackle managment of learned classification models?

Better tooling for Mahout integration? (Lucene for tokenization and
analysers?, data import and export?)



Isabel