You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Dan Filimon <da...@gmail.com> on 2012/10/07 12:43:26 UTC

Mahout Bachelor's Project

Hi Mahout Devs!

I'm Dan Filimon, a 4th year undergrad student at Politehnica
University Bucharest [1] and as part of graduating I need to work on a
final project.

I've recently gotten very interested in AI and Machine Learning
(enough to become convinced that I want to pursue a Master's in this
field) but have just started learning. I'd like to work my way up to
becoming a committer and as I learn more about ML and add new
algorithms to Mahout.

I could get a final project assigned to me by a professor, or work
with a company, but I'd like to do Open Source work (I have done a bit
before [2]).
I'd like my final project (which includes a thesis) to be adding some
(1, 2... n?) new (but well-tested) algorithms to Mahout and building
an application based off them. Time-wise, the deadline for my project
is sometime in July 2013.
I can work on this part-time until about March and allocate more time
afterwards.

I think I can handle the theory (I really enjoy math and understand
the basics of MapReduce framework), the working on a large code-base
(I interned at Google twice, the first time even working on an
open-source project [2]) and am comfortable in Java.

Now, excited as I may be, the thing is I'm not sure where to start. I
read around the Mahout web site, got a copy of the code, got the
Mahout in Action book, got a bunch of ML books, am taking relevant
classes in AI and ML at school this year...
I'd like someone to help me figure out the hoops, guide my work and mentor me.
I know this is asking a lot since I haven't actually _done_ anything
for this project, but please... any volunteers? :)

Thank you!

[1] http://acs.pub.ro/index.php?site=prezentation&lg=english
[2] https://github.com/dfilimon
[3] http://code.google.com/p/sfntly/

Re: Mahout Bachelor's Project

Posted by Dan Filimon <da...@gmail.com>.
On Wed, Oct 17, 2012 at 5:00 AM, Raymond Melton <rt...@gmail.com> wrote:
> I've been following this thread with a great deal of interest, and I think
> it's a neat project!  I've read Dan's corresponding blog entry, and will
> keep up with his blog as well, and am working through Ted's K-Means
> Clustering At Scale paper now.
>
> I'm particularly interested in the process of integrating the knn code from
> github into Mahout.  Not to diminish any other aspect of the project, it's
> just that I'm looking forward to learning what this involves in detail.
>
> Consequently, I'd just like to say that I really appreciate that you're
> keeping the development of this project out in the open here.
>
> Regards, Ray.

Thanks for the kind words Ray!
Will definitely keep you up to date, in the open. :)

> On 10/12/2012 01:45 PM, Ted Dunning wrote:
>>
>> Review the knn code from github
>>
>> File an individual contributors license agreement with Apache
>>
>> Change knn to fit the Mahout API
>>
>> Push back to Mahout
>>
>> Solicit current clustering users for metrics on their data (I can help
>> with
>> this)
>>
>> Write up data generation strategy with useable results
>>
>> Not sure how long these tasks are because they are a bit big for planning
>> purposes, but give a decent outline.
>>
>> On Fri, Oct 12, 2012 at 1:34 PM, Dan
>> Filimon<da...@gmail.com>wrote:
>>
>>> Now, where do I start? What would a plan for the coming months look like?
>>>
>>
>

Re: Mahout Bachelor's Project

Posted by Raymond Melton <rt...@gmail.com>.
I've been following this thread with a great deal of interest, and I 
think it's a neat project!  I've read Dan's corresponding blog entry, 
and will keep up with his blog as well, and am working through Ted's 
K-Means Clustering At Scale paper now.

I'm particularly interested in the process of integrating the knn code 
from github into Mahout.  Not to diminish any other aspect of the 
project, it's just that I'm looking forward to learning what this 
involves in detail.

Consequently, I'd just like to say that I really appreciate that you're 
keeping the development of this project out in the open here.

Regards, Ray.

On 10/12/2012 01:45 PM, Ted Dunning wrote:
> Review the knn code from github
>
> File an individual contributors license agreement with Apache
>
> Change knn to fit the Mahout API
>
> Push back to Mahout
>
> Solicit current clustering users for metrics on their data (I can help with
> this)
>
> Write up data generation strategy with useable results
>
> Not sure how long these tasks are because they are a bit big for planning
> purposes, but give a decent outline.
>
> On Fri, Oct 12, 2012 at 1:34 PM, Dan Filimon<da...@gmail.com>wrote:
>
>> Now, where do I start? What would a plan for the coming months look like?
>>
>


Re: Mahout Bachelor's Project

Posted by Ted Dunning <te...@gmail.com>.
Sounds like a great idea.

I will follow up on this off-list.

On Tue, Oct 16, 2012 at 10:50 AM, Dan Filimon
<da...@gmail.com>wrote:

> Ted, could we possibly set up some sort of weekly 1-on-1s to discuss
> goals/milestones? Or some way to ensure progress is being made? Should
> I include my official supervisor as well (yes, I found one :)?.
>

Re: Mahout Bachelor's Project

Posted by Dan Filimon <da...@gmail.com>.
On Sun, Oct 14, 2012 at 8:42 PM, Ted Dunning <te...@gmail.com> wrote:
> On Sun, Oct 14, 2012 at 2:21 AM, Dan Filimon <da...@gmail.com>wrote:
>
>>
>> As I go, I'd like to update my (somewhat unused) blog,
>> (danf.wordpress.com) to keep track of my progress and let other people
>> know how it's working out.
>>
>
> When post a blog, send a note to this list.

Here's my first blog post [1]. I only talked a bit about Mahout, you
and then gave an overview of the basic k-means algorithm.

A few more questions,
Ted, could we possibly set up some sort of weekly 1-on-1s to discuss
goals/milestones? Or some way to ensure progress is being made? Should
I include my official supervisor as well (yes, I found one :)?.

[1] http://danf.wordpress.com/2012/10/16/starting-work-on-bachelors-project-k-means-basics/

Re: Mahout Bachelor's Project

Posted by Ted Dunning <te...@gmail.com>.
On Sun, Oct 14, 2012 at 2:21 AM, Dan Filimon <da...@gmail.com>wrote:

>
> As I go, I'd like to update my (somewhat unused) blog,
> (danf.wordpress.com) to keep track of my progress and let other people
> know how it's working out.
>

When post a blog, send a note to this list.


>
> As for us communicating, should I ask questions on this dev@ list or
> e-mail you directly?


The preference is always the dev@ list.

Re: Mahout Bachelor's Project

Posted by Dan Filimon <da...@gmail.com>.
On Fri, Oct 12, 2012 at 11:45 PM, Ted Dunning <te...@gmail.com> wrote:
> Review the knn code from github
>
> File an individual contributors license agreement with Apache
>
> Change knn to fit the Mahout API
>
> Push back to Mahout
>
> Solicit current clustering users for metrics on their data (I can help with
> this)
>
> Write up data generation strategy with useable results
>
> Not sure how long these tasks are because they are a bit big for planning
> purposes, but give a decent outline.

Okay, first, let me start looking at the problem we're solving (kNN),
what approaches there are now and what approaches you implemented
(i.e., read the papers, presentations and code).

As I go, I'd like to update my (somewhat unused) blog,
(danf.wordpress.com) to keep track of my progress and let other people
know how it's working out.

As for us communicating, should I ask questions on this dev@ list or
e-mail you directly?

>
> On Fri, Oct 12, 2012 at 1:34 PM, Dan Filimon <da...@gmail.com>wrote:
>
>> Now, where do I start? What would a plan for the coming months look like?
>>

Re: Mahout Bachelor's Project

Posted by Ted Dunning <te...@gmail.com>.
Review the knn code from github

File an individual contributors license agreement with Apache

Change knn to fit the Mahout API

Push back to Mahout

Solicit current clustering users for metrics on their data (I can help with
this)

Write up data generation strategy with useable results

Not sure how long these tasks are because they are a bit big for planning
purposes, but give a decent outline.

On Fri, Oct 12, 2012 at 1:34 PM, Dan Filimon <da...@gmail.com>wrote:

> Now, where do I start? What would a plan for the coming months look like?
>

Re: Mahout Bachelor's Project

Posted by Ted Dunning <te...@gmail.com>.
The new clustering code has several papers attached to the github repo.  I
have given several talks, the most detailed is the one I gave at Oxford a
weeks ago.  You can get those slides from slideshare under my name.

Mahout has a clustering interface that is best learned from the code.

On Fri, Oct 12, 2012 at 1:34 PM, Dan Filimon <da...@gmail.com>wrote:

>
> Now, where do I start? What would a plan for the coming months look like?
> Should I start by first reading the theory? Learn more about Mahout?
>
>

Re: Mahout Bachelor's Project

Posted by Dan Filimon <da...@gmail.com>.



On Oct 12, 2012, at 22:15, Ted Dunning <te...@gmail.com> wrote:

> See http://github.com/tdunning/knn
> 
> The algorithms definitely need more work but what work they need is
> something that needs more testing.
> 
> To get that testing mileage, we need to make those algorithms available in
> a standard framework.
> 
> One thought that I have is that we should be able to build synthetic data
> sets that emulate the clustering and search performance of realistic data.
> If we can avoid looking at anything but a few generalization scores, then
> we have a very solid anonymization story because we won't even be
> generating the same *types* of data in the random generator.  This alone
> would be an interesting thesis topic.
> 
> Again, however, we need runtime from current clustering users to get the
> scores.

Alright, let's do this.
I think we'll get more details clarified as we go.

Now, where do I start? What would a plan for the coming months look like?
Should I start by first reading the theory? Learn more about Mahout?

> On Fri, Oct 12,  2012 at 4:41 AM, Dan Filimon <da...@gmail.com>wrote:
> 
>>> On my side:
>>> 
>>> - I will provide mentor support for this project
>>> 
>>> - I will help you write up the results by reviewing your write-ups and
>>> suggesting structure and content.
>>> 
>>> The benefits to you will be deep knowledge of advanced clustering
>>> algorithms as well as practical experience in how integration like this
>> can
>>> happen.
>> 
>> Could you explain a bit what working on the integration would entail?
>> 
>> I don't want to sound ungrateful here, I definitely want to work with
>> you, but ideally, I'd like to work *on* these advanced clustering
>> algorithms (helping improve them maybe? overambitious?), not just
>> integrate them.
>> 

Re: Mahout Bachelor's Project

Posted by Ted Dunning <te...@gmail.com>.
See http://github.com/tdunning/knn

The algorithms definitely need more work but what work they need is
something that needs more testing.

To get that testing mileage, we need to make those algorithms available in
a standard framework.

One thought that I have is that we should be able to build synthetic data
sets that emulate the clustering and search performance of realistic data.
 If we can avoid looking at anything but a few generalization scores, then
we have a very solid anonymization story because we won't even be
generating the same *types* of data in the random generator.  This alone
would be an interesting thesis topic.

Again, however, we need runtime from current clustering users to get the
scores.

On Fri, Oct 12, 2012 at 4:41 AM, Dan Filimon <da...@gmail.com>wrote:

> > On my side:
> >
> > - I will provide mentor support for this project
> >
> > - I will help you write up the results by reviewing your write-ups and
> > suggesting structure and content.
> >
> > The benefits to you will be deep knowledge of advanced clustering
> > algorithms as well as practical experience in how integration like this
> can
> > happen.
>
> Could you explain a bit what working on the integration would entail?
>
> I don't want to sound ungrateful here, I definitely want to work with
> you, but ideally, I'd like to work *on* these advanced clustering
> algorithms (helping improve them maybe? overambitious?), not just
> integrate them.
>

Re: Mahout Bachelor's Project

Posted by Dan Filimon <da...@gmail.com>.
Thanks for the answers guys!

On Fri, Oct 12, 2012 at 2:00 AM, Ted Dunning <te...@gmail.com> wrote:
> Dan,
>
> Good idea to ping us.  I didn't even see your first request.
>
> I think that Sebastian is correct that your thesis supervisor should be
> local to your university.  He is also correct that just implementing yet
> another algorithm is of little interest.

Yes, you're both right. In fact, I intend to have someone form my
university be my official supervisor. The thing is, I don't know
anyone interested in this kind of work.
That being said, I can find someone who would be willing to supervise
and help me with the administrative side of things, but technically
I'll probably be on my own.

> On the other hand, I could definitely use some help in getting the new
> clustering stuff I have done integrated into Mahout.
>
> So I would be willing to make a trade.

Thanks Ted! Sounds great, but I have a couple of questions:

> On your side:
>
> - you need to find an official university supervisor for the thesis

Yes, I'm looking into this.

> - you will need to put in a fair bit of time on the project

Of course, this would be a final project after all. I want to have a
great thesis and I'm willing to spend time working on it.

> On my side:
>
> - I will provide mentor support for this project
>
> - I will help you write up the results by reviewing your write-ups and
> suggesting structure and content.
>
> The benefits to you will be deep knowledge of advanced clustering
> algorithms as well as practical experience in how integration like this can
> happen.

Could you explain a bit what working on the integration would entail?

I don't want to sound ungrateful here, I definitely want to work with
you, but ideally, I'd like to work *on* these advanced clustering
algorithms (helping improve them maybe? overambitious?), not just
integrate them.

Thanks a lot!

> On Thu, Oct 11, 2012 at 2:38 PM, Sebastian Schelter <ss...@apache.org> wrote:
>
>> Hi Dan,
>>
>> I think there are two reasons why you didn't get an answer yet.
>>
>> The first reason is that the project is driven by volunteers and from my
>> experience everyone here has lots of other things to do and usually only
>> little time to spare for Mahout (unfortunately). You asked for guidance
>> and mentorship of a bachelor thesis which I guess nobody can provide
>> here. And IMHO this is also not the task of open source developers, your
>> thesis should be supervised by someone from your university (for your
>> own sake).
>>
>> The second reason is that it turned out over the last months that simply
>> adding new algorithm implementations that are not production-tested did
>> not help the project. We accepted lots of such contributions and it
>> turned out that people did not maintain them or that they were of minor
>> quality. That's why we choose to be more conservative with what we
>> accept. It turned out that it's not that hard to implement algorithms on
>> MapReduce but its hard to do this in a really efficient way that will be
>> helpful for others.
>>
>> I really like your enthusiasm and willingness to contribute to the
>> project, but I'd say there are plenty more important things to do than
>> contributing a new algorithm and a bachelor thesis is probably not the
>> right setting to start the work on Mahout.
>>
>> Nevertheless you could find a topic related to Mahout (using Mahout or
>> evolving some algorithm contained in it), have it supervised by someone
>> from your university and after that maybe contribute your
>> findings/bugfixes/whatever back.
>>
>> Best,
>> Sebastian
>>
>> On 11.10.2012 22:20, Dan Filimon wrote:
>> > On Sun, Oct 7, 2012 at 1:43 PM, Dan Filimon <da...@gmail.com>
>> wrote:
>> >> Hi Mahout Devs!
>> >>
>> >> I'm Dan Filimon, a 4th year undergrad student at Politehnica
>> >> University Bucharest [1] and as part of graduating I need to work on
>> >> final project.
>> >>
>> >> I've recently gotten very interested in AI and Machine Learning
>> >> (enough to become convinced that I want to pursue a Master's in this
>> >> field) but have just started learning. I'd like to work my way up to
>> >> becoming a committer and as I learn more about ML and add new
>> >> algorithms to Mahout.
>> >>
>> >> I could get a final project assigned to me by a professor, or work
>> >> with a company, but I'd like to do Open Source work (I have done a bit
>> >> before [2]).
>> >> I'd like my final project (which includes a thesis) to be adding some
>> >> (1, 2... n?) new (but well-tested) algorithms to Mahout and building
>> >> an application based off them. Time-wise, the deadline for my project
>> >> is sometime in July 2013.
>> >> I can work on this part-time until about March and allocate more time
>> >> afterwards.
>> >>
>> >> I think I can handle the theory (I really enjoy math and understand
>> >> the basics of MapReduce framework), the working on a large code-base
>> >> (I interned at Google twice, the first time even working on an
>> >> open-source project [2]) and am comfortable in Java.
>> >>
>> >> Now, excited as I may be, the thing is I'm not sure where to start. I
>> >> read around the Mahout web site, got a copy of the code, got the
>> >> Mahout in Action book, got a bunch of ML books, am taking relevant
>> >> classes in AI and ML at school this year...
>> >> I'd like someone to help me figure out the hoops, guide my work and
>> mentor me.
>> >> I know this is asking a lot since I haven't actually _done_ anything
>> >> for this project, but please... any volunteers? :)
>> >>
>> >> Thank you!
>> >>
>> >> [1] http://acs.pub.ro/index.php?site=prezentation&lg=english
>> >> [2] https://github.com/dfilimon
>> >> [3] http://code.google.com/p/sfntly/
>> >
>> > Ping!
>> > Also, for more info, my LinkedIn page is [1]. :)
>> >
>> > At least suggestions? Should I be taking a different approach here?
>> > Try submitting some patches before asking again? Learning more first?
>> >
>> > [1] http://www.linkedin.com/pub/dan-filimon/23/845/540
>> >
>>
>>

Re: Mahout Bachelor's Project

Posted by Ted Dunning <te...@gmail.com>.
Dan,

Good idea to ping us.  I didn't even see your first request.

I think that Sebastian is correct that your thesis supervisor should be
local to your university.  He is also correct that just implementing yet
another algorithm is of little interest.

On the other hand, I could definitely use some help in getting the new
clustering stuff I have done integrated into Mahout.

So I would be willing to make a trade.

On your side:

- you need to find an official university supervisor for the thesis

- you will need to put in a fair bit of time on the project

On my side:

- I will provide mentor support for this project

- I will help you write up the results by reviewing your write-ups and
suggesting structure and content.

The benefits to you will be deep knowledge of advanced clustering
algorithms as well as practical experience in how integration like this can
happen.

On Thu, Oct 11, 2012 at 2:38 PM, Sebastian Schelter <ss...@apache.org> wrote:

> Hi Dan,
>
> I think there are two reasons why you didn't get an answer yet.
>
> The first reason is that the project is driven by volunteers and from my
> experience everyone here has lots of other things to do and usually only
> little time to spare for Mahout (unfortunately). You asked for guidance
> and mentorship of a bachelor thesis which I guess nobody can provide
> here. And IMHO this is also not the task of open source developers, your
> thesis should be supervised by someone from your university (for your
> own sake).
>
> The second reason is that it turned out over the last months that simply
> adding new algorithm implementations that are not production-tested did
> not help the project. We accepted lots of such contributions and it
> turned out that people did not maintain them or that they were of minor
> quality. That's why we choose to be more conservative with what we
> accept. It turned out that it's not that hard to implement algorithms on
> MapReduce but its hard to do this in a really efficient way that will be
> helpful for others.
>
> I really like your enthusiasm and willingness to contribute to the
> project, but I'd say there are plenty more important things to do than
> contributing a new algorithm and a bachelor thesis is probably not the
> right setting to start the work on Mahout.
>
> Nevertheless you could find a topic related to Mahout (using Mahout or
> evolving some algorithm contained in it), have it supervised by someone
> from your university and after that maybe contribute your
> findings/bugfixes/whatever back.
>
> Best,
> Sebastian
>
> On 11.10.2012 22:20, Dan Filimon wrote:
> > On Sun, Oct 7, 2012 at 1:43 PM, Dan Filimon <da...@gmail.com>
> wrote:
> >> Hi Mahout Devs!
> >>
> >> I'm Dan Filimon, a 4th year undergrad student at Politehnica
> >> University Bucharest [1] and as part of graduating I need to work on
> >> final project.
> >>
> >> I've recently gotten very interested in AI and Machine Learning
> >> (enough to become convinced that I want to pursue a Master's in this
> >> field) but have just started learning. I'd like to work my way up to
> >> becoming a committer and as I learn more about ML and add new
> >> algorithms to Mahout.
> >>
> >> I could get a final project assigned to me by a professor, or work
> >> with a company, but I'd like to do Open Source work (I have done a bit
> >> before [2]).
> >> I'd like my final project (which includes a thesis) to be adding some
> >> (1, 2... n?) new (but well-tested) algorithms to Mahout and building
> >> an application based off them. Time-wise, the deadline for my project
> >> is sometime in July 2013.
> >> I can work on this part-time until about March and allocate more time
> >> afterwards.
> >>
> >> I think I can handle the theory (I really enjoy math and understand
> >> the basics of MapReduce framework), the working on a large code-base
> >> (I interned at Google twice, the first time even working on an
> >> open-source project [2]) and am comfortable in Java.
> >>
> >> Now, excited as I may be, the thing is I'm not sure where to start. I
> >> read around the Mahout web site, got a copy of the code, got the
> >> Mahout in Action book, got a bunch of ML books, am taking relevant
> >> classes in AI and ML at school this year...
> >> I'd like someone to help me figure out the hoops, guide my work and
> mentor me.
> >> I know this is asking a lot since I haven't actually _done_ anything
> >> for this project, but please... any volunteers? :)
> >>
> >> Thank you!
> >>
> >> [1] http://acs.pub.ro/index.php?site=prezentation&lg=english
> >> [2] https://github.com/dfilimon
> >> [3] http://code.google.com/p/sfntly/
> >
> > Ping!
> > Also, for more info, my LinkedIn page is [1]. :)
> >
> > At least suggestions? Should I be taking a different approach here?
> > Try submitting some patches before asking again? Learning more first?
> >
> > [1] http://www.linkedin.com/pub/dan-filimon/23/845/540
> >
>
>

Re: Mahout Bachelor's Project

Posted by Sebastian Schelter <ss...@apache.org>.
Hi Dan,

I think there are two reasons why you didn't get an answer yet.

The first reason is that the project is driven by volunteers and from my
experience everyone here has lots of other things to do and usually only
little time to spare for Mahout (unfortunately). You asked for guidance
and mentorship of a bachelor thesis which I guess nobody can provide
here. And IMHO this is also not the task of open source developers, your
thesis should be supervised by someone from your university (for your
own sake).

The second reason is that it turned out over the last months that simply
adding new algorithm implementations that are not production-tested did
not help the project. We accepted lots of such contributions and it
turned out that people did not maintain them or that they were of minor
quality. That's why we choose to be more conservative with what we
accept. It turned out that it's not that hard to implement algorithms on
MapReduce but its hard to do this in a really efficient way that will be
helpful for others.

I really like your enthusiasm and willingness to contribute to the
project, but I'd say there are plenty more important things to do than
contributing a new algorithm and a bachelor thesis is probably not the
right setting to start the work on Mahout.

Nevertheless you could find a topic related to Mahout (using Mahout or
evolving some algorithm contained in it), have it supervised by someone
from your university and after that maybe contribute your
findings/bugfixes/whatever back.

Best,
Sebastian

On 11.10.2012 22:20, Dan Filimon wrote:
> On Sun, Oct 7, 2012 at 1:43 PM, Dan Filimon <da...@gmail.com> wrote:
>> Hi Mahout Devs!
>>
>> I'm Dan Filimon, a 4th year undergrad student at Politehnica
>> University Bucharest [1] and as part of graduating I need to work on 
>> final project.
>>
>> I've recently gotten very interested in AI and Machine Learning
>> (enough to become convinced that I want to pursue a Master's in this
>> field) but have just started learning. I'd like to work my way up to
>> becoming a committer and as I learn more about ML and add new
>> algorithms to Mahout.
>>
>> I could get a final project assigned to me by a professor, or work
>> with a company, but I'd like to do Open Source work (I have done a bit
>> before [2]).
>> I'd like my final project (which includes a thesis) to be adding some
>> (1, 2... n?) new (but well-tested) algorithms to Mahout and building
>> an application based off them. Time-wise, the deadline for my project
>> is sometime in July 2013.
>> I can work on this part-time until about March and allocate more time
>> afterwards.
>>
>> I think I can handle the theory (I really enjoy math and understand
>> the basics of MapReduce framework), the working on a large code-base
>> (I interned at Google twice, the first time even working on an
>> open-source project [2]) and am comfortable in Java.
>>
>> Now, excited as I may be, the thing is I'm not sure where to start. I
>> read around the Mahout web site, got a copy of the code, got the
>> Mahout in Action book, got a bunch of ML books, am taking relevant
>> classes in AI and ML at school this year...
>> I'd like someone to help me figure out the hoops, guide my work and mentor me.
>> I know this is asking a lot since I haven't actually _done_ anything
>> for this project, but please... any volunteers? :)
>>
>> Thank you!
>>
>> [1] http://acs.pub.ro/index.php?site=prezentation&lg=english
>> [2] https://github.com/dfilimon
>> [3] http://code.google.com/p/sfntly/
> 
> Ping!
> Also, for more info, my LinkedIn page is [1]. :)
> 
> At least suggestions? Should I be taking a different approach here?
> Try submitting some patches before asking again? Learning more first?
> 
> [1] http://www.linkedin.com/pub/dan-filimon/23/845/540
> 


Re: Mahout Bachelor's Project

Posted by Dan Filimon <da...@gmail.com>.
On Sun, Oct 7, 2012 at 1:43 PM, Dan Filimon <da...@gmail.com> wrote:
> Hi Mahout Devs!
>
> I'm Dan Filimon, a 4th year undergrad student at Politehnica
> University Bucharest [1] and as part of graduating I need to work on a
> final project.
>
> I've recently gotten very interested in AI and Machine Learning
> (enough to become convinced that I want to pursue a Master's in this
> field) but have just started learning. I'd like to work my way up to
> becoming a committer and as I learn more about ML and add new
> algorithms to Mahout.
>
> I could get a final project assigned to me by a professor, or work
> with a company, but I'd like to do Open Source work (I have done a bit
> before [2]).
> I'd like my final project (which includes a thesis) to be adding some
> (1, 2... n?) new (but well-tested) algorithms to Mahout and building
> an application based off them. Time-wise, the deadline for my project
> is sometime in July 2013.
> I can work on this part-time until about March and allocate more time
> afterwards.
>
> I think I can handle the theory (I really enjoy math and understand
> the basics of MapReduce framework), the working on a large code-base
> (I interned at Google twice, the first time even working on an
> open-source project [2]) and am comfortable in Java.
>
> Now, excited as I may be, the thing is I'm not sure where to start. I
> read around the Mahout web site, got a copy of the code, got the
> Mahout in Action book, got a bunch of ML books, am taking relevant
> classes in AI and ML at school this year...
> I'd like someone to help me figure out the hoops, guide my work and mentor me.
> I know this is asking a lot since I haven't actually _done_ anything
> for this project, but please... any volunteers? :)
>
> Thank you!
>
> [1] http://acs.pub.ro/index.php?site=prezentation&lg=english
> [2] https://github.com/dfilimon
> [3] http://code.google.com/p/sfntly/

Ping!
Also, for more info, my LinkedIn page is [1]. :)

At least suggestions? Should I be taking a different approach here?
Try submitting some patches before asking again? Learning more first?

[1] http://www.linkedin.com/pub/dan-filimon/23/845/540