You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Hotmail Email Address <sx...@hotmail.com> on 2010/08/13 07:00:32 UTC

Contributions to mahout

Hi Folks,
I joined this list a week or so ago and am looking to contribute to Mahout, I have studied ML in grad school and am looking to work in either of the areas below:

1) assimilating a framework to introduce multiple layer or single layer neural nets to solve problems in image processing or computer vision

2) genetic algorithms related to solving computationally demanding problems

3) experimenting with mahout on other data stores such as mongodb or rika or Cassandra

4) more thorough unit tests for some of the code using things like jbehave

I am looking for recommendations from the community on the process to go about this, should I just start with the Jira tasks and assign myself some tasks pertaining to the above areas or start with number 4.

Also is there a project suggestions page for mahout similar to the one in hadoop, that would be a great idea for new folks to help.

Best Regards  

Sent from my iPad

Re: Contributions to mahout

Posted by Grant Ingersoll <gs...@apache.org>.
On Aug 13, 2010, at 2:01 AM, Ted Dunning wrote:

> 
>> I am looking for recommendations from the community on the process to go
>> about this, should I just start with the Jira tasks and assign myself some
>> tasks pertaining to the above areas or start with number 4.
>> 
> 
> JIRA's tend to be filed when somebody has an itch that they are about to
> scratch.  That means that there isn't so much of a backlog of work to be
> done there ... if a JIRA sits around for a bit, it is, by definition, not
> something that somebody is pushing for very hard.
> 

https://cwiki.apache.org/confluence/display/MAHOUT/HowToContribute should be helpful as well.

Re: Contributions to mahout

Posted by Saikat Kanjilal <sx...@hotmail.com>.
Thanks for the updates Ted, I'll take a look at some of these topics and pick an area to start with.  My apologies for my name not showing up, my name is Saikat Kanjilal.  

Sent from my iPhone

On Aug 12, 2010, at 11:01 PM, Ted Dunning <te...@gmail.com> wrote:

> On Thu, Aug 12, 2010 at 10:00 PM, Hotmail Email Address <sxk1969@hotmail.com
>> wrote:
> 
>> 
>> I joined this list a week or so ago and am looking to contribute to Mahout,
>> I have studied ML in grad school
> 
> 
> That is excellent.
> 
> 
>> 
>> 1) assimilating a framework to introduce multiple layer or single layer
>> neural nets to solve problems in image processing or computer vision
>> 
> 
> The Neuroph project are looking at ways to introduce their Neural Network
> software into Mahout.  There will be significant amounts of effort required
> there.
> 
> Also, the GSOC project that Zhao Zhendong worked on with SVM's will need to
> have some documentation, testing and integration work.
> 
> For that matter, there is the question of the grand unification of all of
> our clustering and classification code.  Thought on that score as well as
> adaptation work would be of real interest.
> 
> On a related note, however, there is very little in the way of methods for
> deploying a classifier (either from supervised or unsupervised learning) as
> a server.  We can do that with recommendations, but it would be really cool
> if a classifier could be deployed as a recommendation engine.
> 
> 
>> 2) genetic algorithms related to solving computationally demanding problems
>> 
> 
> We have some code in this area, but I am not particularly convinced that the
> approaches are very scalable or efficient.  Very large scale projects tend
> to focus on lean and mean algorithms and are typically of very high
> dimension which both makes many genetic approaches very inefficient and
> simpler approaches surprisingly effective.
> 
> 
>> 
>> 3) experimenting with mahout on other data stores such as mongodb or rika
>> or Cassandra
>> 
> 
> Not sure what you have in mind here although having a storybook available
> with tales of "here's how you can read data from xyz" might be nice.
> Hopefully there is little difference no matter where the data comes from.
> 
> 4) more thorough unit tests for some of the code using things like jbehave
>> 
> 
> More tests are ALWAYS welcome and we have a boatload of untested code in the
> math module.  What happened there is that we did a mass import and
> deprecation of the COLT package.  As we are finding uses for the code, we
> are translating them to use our matrix package and adding tests.  If you
> look at https://issues.apache.org/jira/browse/MAHOUT-469 you can see an
> example of that.
> 
> 
>> I am looking for recommendations from the community on the process to go
>> about this, should I just start with the Jira tasks and assign myself some
>> tasks pertaining to the above areas or start with number 4.
>> 
> 
> JIRA's tend to be filed when somebody has an itch that they are about to
> scratch.  That means that there isn't so much of a backlog of work to be
> done there ... if a JIRA sits around for a bit, it is, by definition, not
> something that somebody is pushing for very hard.
> 
> 
>> Also is there a project suggestions page for mahout similar to the one in
>> hadoop, that would be a great idea for new folks to help.
>> 
> 
> There is such a beast, but it may not really reflect what is needed right
> now.
> 
> This page might be some of what you are looking for:
> https://cwiki.apache.org/confluence/display/MAHOUT/Algorithms
> 
> 
> 
>> 
>> Best Regards
>> 
>> Sent from my iPad
> 
> 
> Do you have a name?  Perhaps something better than "Hotmail Email Address"?

Re: Contributions to mahout

Posted by Ted Dunning <te...@gmail.com>.
On Thu, Aug 12, 2010 at 10:00 PM, Hotmail Email Address <sxk1969@hotmail.com
> wrote:

>
> I joined this list a week or so ago and am looking to contribute to Mahout,
> I have studied ML in grad school


That is excellent.


>
> 1) assimilating a framework to introduce multiple layer or single layer
> neural nets to solve problems in image processing or computer vision
>

The Neuroph project are looking at ways to introduce their Neural Network
software into Mahout.  There will be significant amounts of effort required
there.

Also, the GSOC project that Zhao Zhendong worked on with SVM's will need to
have some documentation, testing and integration work.

For that matter, there is the question of the grand unification of all of
our clustering and classification code.  Thought on that score as well as
adaptation work would be of real interest.

 On a related note, however, there is very little in the way of methods for
deploying a classifier (either from supervised or unsupervised learning) as
a server.  We can do that with recommendations, but it would be really cool
if a classifier could be deployed as a recommendation engine.


> 2) genetic algorithms related to solving computationally demanding problems
>

We have some code in this area, but I am not particularly convinced that the
approaches are very scalable or efficient.  Very large scale projects tend
to focus on lean and mean algorithms and are typically of very high
dimension which both makes many genetic approaches very inefficient and
simpler approaches surprisingly effective.


>
> 3) experimenting with mahout on other data stores such as mongodb or rika
> or Cassandra
>

Not sure what you have in mind here although having a storybook available
with tales of "here's how you can read data from xyz" might be nice.
 Hopefully there is little difference no matter where the data comes from.

4) more thorough unit tests for some of the code using things like jbehave
>

More tests are ALWAYS welcome and we have a boatload of untested code in the
math module.  What happened there is that we did a mass import and
deprecation of the COLT package.  As we are finding uses for the code, we
are translating them to use our matrix package and adding tests.  If you
look at https://issues.apache.org/jira/browse/MAHOUT-469 you can see an
example of that.


> I am looking for recommendations from the community on the process to go
> about this, should I just start with the Jira tasks and assign myself some
> tasks pertaining to the above areas or start with number 4.
>

JIRA's tend to be filed when somebody has an itch that they are about to
scratch.  That means that there isn't so much of a backlog of work to be
done there ... if a JIRA sits around for a bit, it is, by definition, not
something that somebody is pushing for very hard.


> Also is there a project suggestions page for mahout similar to the one in
> hadoop, that would be a great idea for new folks to help.
>

There is such a beast, but it may not really reflect what is needed right
now.

This page might be some of what you are looking for:
https://cwiki.apache.org/confluence/display/MAHOUT/Algorithms



>
> Best Regards
>
> Sent from my iPad


Do you have a name?  Perhaps something better than "Hotmail Email Address"?