You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Grant Ingersoll <gs...@apache.org> on 2008/01/25 13:25:24 UTC

Mahout Machine Learning Project Launches

(Apologies for cross-posting)

The Lucene PMC is pleased to announce the creation of the Mahout  
Machine Learning project, located at http://lucene.apache.org/mahout.   
Mahout's goal is to create a suite of practical, scalable machine  
learning libraries.  Our initial plan is to utilize Hadoop (http://hadoop.apache.org 
) to implement a variety of algorithms including naive bayes, neural  
networks, support vector machines and k-Means, among others.  While  
our initial focus is on these algorithms, we welcome other machine  
learning ideas as well.

Naturally, we are looking for volunteers to help grow the community  
and make the project successful.  So, if machine learning is your  
thing, come on over and lend a hand!

Cheers,
Grant Ingersoll

http://lucene.apache.org/mahout

Re: Mahout Machine Learning Project Launches

Posted by Lukas Vlcek <lu...@gmail.com>.
Hi,

I believe it is because this project just got started thus there is no code
yet.

Lukas

On Jan 26, 2008 11:00 AM, sishen <ye...@gmail.com> wrote:

> Interest project, :)
>
> But why can't I checkout a copy of source?  The url is
> "http://svn.apache.org/repos/asf/lucene/mahout/trunk", right? I just
> got a empty README.txt.
>
> Thanks for your help.
>
> Best regards,
>
> sishen
>
> On Jan 25, 2008 8:25 PM, Grant Ingersoll <gs...@apache.org> wrote:
> > (Apologies for cross-posting)
> >
> > The Lucene PMC is pleased to announce the creation of the Mahout
> > Machine Learning project, located at http://lucene.apache.org/mahout.
> > Mahout's goal is to create a suite of practical, scalable machine
> > learning libraries.  Our initial plan is to utilize Hadoop (
> http://hadoop.apache.org
> > ) to implement a variety of algorithms including naive bayes, neural
> > networks, support vector machines and k-Means, among others.  While
> > our initial focus is on these algorithms, we welcome other machine
> > learning ideas as well.
> >
> > Naturally, we are looking for volunteers to help grow the community
> > and make the project successful.  So, if machine learning is your
> > thing, come on over and lend a hand!
> >
> > Cheers,
> > Grant Ingersoll
> >
> > http://lucene.apache.org/mahout
> >
>



-- 
http://blog.lukas-vlcek.com/

Re: Mahout Machine Learning Project Launches

Posted by sishen <ye...@gmail.com>.
Interest project, :)

But why can't I checkout a copy of source?  The url is
"http://svn.apache.org/repos/asf/lucene/mahout/trunk", right? I just
got a empty README.txt.

Thanks for your help.

Best regards,

sishen

On Jan 25, 2008 8:25 PM, Grant Ingersoll <gs...@apache.org> wrote:
> (Apologies for cross-posting)
>
> The Lucene PMC is pleased to announce the creation of the Mahout
> Machine Learning project, located at http://lucene.apache.org/mahout.
> Mahout's goal is to create a suite of practical, scalable machine
> learning libraries.  Our initial plan is to utilize Hadoop (http://hadoop.apache.org
> ) to implement a variety of algorithms including naive bayes, neural
> networks, support vector machines and k-Means, among others.  While
> our initial focus is on these algorithms, we welcome other machine
> learning ideas as well.
>
> Naturally, we are looking for volunteers to help grow the community
> and make the project successful.  So, if machine learning is your
> thing, come on over and lend a hand!
>
> Cheers,
> Grant Ingersoll
>
> http://lucene.apache.org/mahout
>

Re: Mahout Machine Learning Project Launches

Posted by Bradford Stephens <br...@gmail.com>.
Quite an interesting initiative -- I'll keep my eye on it!

On Jan 25, 2008 4:25 AM, Grant Ingersoll <gs...@apache.org> wrote:
> (Apologies for cross-posting)
>
> The Lucene PMC is pleased to announce the creation of the Mahout
> Machine Learning project, located at http://lucene.apache.org/mahout.
> Mahout's goal is to create a suite of practical, scalable machine
> learning libraries.  Our initial plan is to utilize Hadoop (http://hadoop.apache.org
> ) to implement a variety of algorithms including naive bayes, neural
> networks, support vector machines and k-Means, among others.  While
> our initial focus is on these algorithms, we welcome other machine
> learning ideas as well.
>
> Naturally, we are looking for volunteers to help grow the community
> and make the project successful.  So, if machine learning is your
> thing, come on over and lend a hand!
>
> Cheers,
> Grant Ingersoll
>
> http://lucene.apache.org/mahout
>

Re: Mahout Machine Learning Project Launches

Posted by Ted Dunning <td...@veoh.com>.
I don't think anybody has figured out how to patent the Lanczos algorithm
itself!


On 2/6/08 10:03 AM, "Peter W." <pe...@marketingbrokers.com> wrote:

> Hello,
> 
> This is Mahout project seems very interesting.
> 
> Any problem that has reducibility components
> using mapreduce and can then be described as a
> linear equation would be excellent candidates.
> 
> Most Nutch developers probably don't need HMM
> but instead the power method to iterate over
> Markov chains or Perron-Frobenius.
> 
> However, some of that work as it pertains to
> the web has been patented so it would be more
> productive for the Hadoop community to focus
> on other areas such as adjacency matrices,
> SALSA or bipartite graphs using Hbase.
> 
> Bye,
> 
> Peter W.
> 
> 
> On Feb 2, 2008, at 3:43 AM, edward yoon wrote:
> 
>> I thought of Hidden Markov Models (HMM) as absolutely impossible on
>> MR model.
>> If anyone have some information, please let me know.
>> 
>> Thanks.
>> 
>> On 2/2/08, edward yoon <ed...@udanax.org> wrote:
>>> I read an interesting piece of information in that NISP paper, and i
>>> was implemented but
>>> 
>>> Now, there's too much mailing-list for me to read.
>>> Lucene, Core, Hbase, Pig, Solr, Mahout ..... :(
>>> 
>>> Too distributed.
>>> 
>>> On 2/2/08, gopi <go...@gmail.com> wrote:
>>>> I'm definitely excited about Machine Learning Algorithms being
>>>> implemented
>>>> into this project!
>>>> I'm currently a student studying a Machine Learning, and would
>>>> love to help
>>>> out in every possible manner.
>>>> 
>>>> Thanks
>>>> Chaitanya Sharma
>>>> 
>>>> On Jan 25, 2008 5:55 PM, Grant Ingersoll <gs...@apache.org>
>>>> wrote:
>>>> 
>>>>> (Apologies for cross-posting)
>>>>> 
>>>>> The Lucene PMC is pleased to announce the creation of the Mahout
>>>>> Machine Learning project, located at http://lucene.apache.org/
>>>>> mahout.
>>>>> Mahout's goal is to create a suite of practical, scalable machine
>>>>> learning libraries.  Our initial plan is to utilize Hadoop (
>>>>> http://hadoop.apache.org
>>>>> ) to implement a variety of algorithms including naive bayes,
>>>>> neural
>>>>> networks, support vector machines and k-Means, among others.  While
>>>>> our initial focus is on these algorithms, we welcome other machine
>>>>> learning ideas as well.
>>>>> 
>>>>> Naturally, we are looking for volunteers to help grow the community
>>>>> and make the project successful.  So, if machine learning is your
>>>>> thing, come on over and lend a hand!
>>>>> 
>>>>> Cheers,
>>>>> Grant Ingersoll
>>>>> 
>>>>> http://lucene.apache.org/mahout


Re: Mahout Machine Learning Project Launches

Posted by "Peter W." <pe...@marketingbrokers.com>.
Hello,

This is Mahout project seems very interesting.

Any problem that has reducibility components
using mapreduce and can then be described as a
linear equation would be excellent candidates.

Most Nutch developers probably don't need HMM
but instead the power method to iterate over
Markov chains or Perron-Frobenius.

However, some of that work as it pertains to
the web has been patented so it would be more
productive for the Hadoop community to focus
on other areas such as adjacency matrices,
SALSA or bipartite graphs using Hbase.

Bye,

Peter W.


On Feb 2, 2008, at 3:43 AM, edward yoon wrote:

> I thought of Hidden Markov Models (HMM) as absolutely impossible on  
> MR model.
> If anyone have some information, please let me know.
>
> Thanks.
>
> On 2/2/08, edward yoon <ed...@udanax.org> wrote:
>> I read an interesting piece of information in that NISP paper, and i
>> was implemented but
>>
>> Now, there's too much mailing-list for me to read.
>> Lucene, Core, Hbase, Pig, Solr, Mahout ..... :(
>>
>> Too distributed.
>>
>> On 2/2/08, gopi <go...@gmail.com> wrote:
>>> I'm definitely excited about Machine Learning Algorithms being  
>>> implemented
>>> into this project!
>>> I'm currently a student studying a Machine Learning, and would  
>>> love to help
>>> out in every possible manner.
>>>
>>> Thanks
>>> Chaitanya Sharma
>>>
>>> On Jan 25, 2008 5:55 PM, Grant Ingersoll <gs...@apache.org>  
>>> wrote:
>>>
>>>> (Apologies for cross-posting)
>>>>
>>>> The Lucene PMC is pleased to announce the creation of the Mahout
>>>> Machine Learning project, located at http://lucene.apache.org/ 
>>>> mahout.
>>>> Mahout's goal is to create a suite of practical, scalable machine
>>>> learning libraries.  Our initial plan is to utilize Hadoop (
>>>> http://hadoop.apache.org
>>>> ) to implement a variety of algorithms including naive bayes,  
>>>> neural
>>>> networks, support vector machines and k-Means, among others.  While
>>>> our initial focus is on these algorithms, we welcome other machine
>>>> learning ideas as well.
>>>>
>>>> Naturally, we are looking for volunteers to help grow the community
>>>> and make the project successful.  So, if machine learning is your
>>>> thing, come on over and lend a hand!
>>>>
>>>> Cheers,
>>>> Grant Ingersoll
>>>>
>>>> http://lucene.apache.org/mahout

Re: Mahout Machine Learning Project Launches

Posted by Ted Dunning <td...@veoh.com>.
I don't think that they would be all that difficult as long as you have a
large enough problem.

EM methods for discrete problems like HMM's as well as the closely related
variational Bayesian methods depend mostly on counting instances.  Indeed,
Gibbs sampling on hidden variable techniques depend on the same sort of
thing.  A good example is the Buntine and Jakulin paper on DCA.

Map-reduce is famously good at this sort of counting problem.  In general
for methods analogous to EM, you will have a map-reduce step for the
estimation phase and one for the maximization phase.  Both steps are very
much like word counting except that it just takes a bit of math to figure
out which words you think you are counting.

Just like with word counting, if you are doing a tiny example, MR will be
much slower.  If you working on a very large problem, though, it can be much
larger.


On 2/2/08 3:43 AM, "edward yoon" <ed...@udanax.org> wrote:

> I thought of Hidden Markov Models (HMM) as absolutely impossible on MR model.
> If anyone have some information, please let me know.
> 
> Thanks.
> 
> On 2/2/08, edward yoon <ed...@udanax.org> wrote:
>> I read an interesting piece of information in that NISP paper, and i
>> was implemented but
>> 
>> Now, there's too much mailing-list for me to read.
>> Lucene, Core, Hbase, Pig, Solr, Mahout ..... :(
>> 
>> Too distributed.
>> 
>> On 2/2/08, gopi <go...@gmail.com> wrote:
>>> I'm definitely excited about Machine Learning Algorithms being implemented
>>> into this project!
>>> I'm currently a student studying a Machine Learning, and would love to help
>>> out in every possible manner.
>>> 
>>> Thanks
>>> Chaitanya Sharma
>>> 
>>> On Jan 25, 2008 5:55 PM, Grant Ingersoll <gs...@apache.org> wrote:
>>> 
>>>> (Apologies for cross-posting)
>>>> 
>>>> The Lucene PMC is pleased to announce the creation of the Mahout
>>>> Machine Learning project, located at http://lucene.apache.org/mahout.
>>>> Mahout's goal is to create a suite of practical, scalable machine
>>>> learning libraries.  Our initial plan is to utilize Hadoop (
>>>> http://hadoop.apache.org
>>>> ) to implement a variety of algorithms including naive bayes, neural
>>>> networks, support vector machines and k-Means, among others.  While
>>>> our initial focus is on these algorithms, we welcome other machine
>>>> learning ideas as well.
>>>> 
>>>> Naturally, we are looking for volunteers to help grow the community
>>>> and make the project successful.  So, if machine learning is your
>>>> thing, come on over and lend a hand!
>>>> 
>>>> Cheers,
>>>> Grant Ingersoll
>>>> 
>>>> http://lucene.apache.org/mahout
>>>> 
>>> 
>> 
>> 
>> --
>> B. Regards,
>> Edward yoon @ NHN, corp.
>> 
> 


Re: Mahout Machine Learning Project Launches

Posted by edward yoon <ed...@udanax.org>.
I thought of Hidden Markov Models (HMM) as absolutely impossible on MR model.
If anyone have some information, please let me know.

Thanks.

On 2/2/08, edward yoon <ed...@udanax.org> wrote:
> I read an interesting piece of information in that NISP paper, and i
> was implemented but
>
> Now, there's too much mailing-list for me to read.
> Lucene, Core, Hbase, Pig, Solr, Mahout ..... :(
>
> Too distributed.
>
> On 2/2/08, gopi <go...@gmail.com> wrote:
> > I'm definitely excited about Machine Learning Algorithms being implemented
> > into this project!
> > I'm currently a student studying a Machine Learning, and would love to help
> > out in every possible manner.
> >
> > Thanks
> > Chaitanya Sharma
> >
> > On Jan 25, 2008 5:55 PM, Grant Ingersoll <gs...@apache.org> wrote:
> >
> > > (Apologies for cross-posting)
> > >
> > > The Lucene PMC is pleased to announce the creation of the Mahout
> > > Machine Learning project, located at http://lucene.apache.org/mahout.
> > > Mahout's goal is to create a suite of practical, scalable machine
> > > learning libraries.  Our initial plan is to utilize Hadoop (
> > > http://hadoop.apache.org
> > > ) to implement a variety of algorithms including naive bayes, neural
> > > networks, support vector machines and k-Means, among others.  While
> > > our initial focus is on these algorithms, we welcome other machine
> > > learning ideas as well.
> > >
> > > Naturally, we are looking for volunteers to help grow the community
> > > and make the project successful.  So, if machine learning is your
> > > thing, come on over and lend a hand!
> > >
> > > Cheers,
> > > Grant Ingersoll
> > >
> > > http://lucene.apache.org/mahout
> > >
> >
>
>
> --
> B. Regards,
> Edward yoon @ NHN, corp.
>


-- 
B. Regards,
Edward yoon @ NHN, corp.

Re: Mahout Machine Learning Project Launches

Posted by edward yoon <ed...@udanax.org>.
I read an interesting piece of information in that NISP paper, and i
was implemented but

Now, there's too much mailing-list for me to read.
Lucene, Core, Hbase, Pig, Solr, Mahout ..... :(

Too distributed.

On 2/2/08, gopi <go...@gmail.com> wrote:
> I'm definitely excited about Machine Learning Algorithms being implemented
> into this project!
> I'm currently a student studying a Machine Learning, and would love to help
> out in every possible manner.
>
> Thanks
> Chaitanya Sharma
>
> On Jan 25, 2008 5:55 PM, Grant Ingersoll <gs...@apache.org> wrote:
>
> > (Apologies for cross-posting)
> >
> > The Lucene PMC is pleased to announce the creation of the Mahout
> > Machine Learning project, located at http://lucene.apache.org/mahout.
> > Mahout's goal is to create a suite of practical, scalable machine
> > learning libraries.  Our initial plan is to utilize Hadoop (
> > http://hadoop.apache.org
> > ) to implement a variety of algorithms including naive bayes, neural
> > networks, support vector machines and k-Means, among others.  While
> > our initial focus is on these algorithms, we welcome other machine
> > learning ideas as well.
> >
> > Naturally, we are looking for volunteers to help grow the community
> > and make the project successful.  So, if machine learning is your
> > thing, come on over and lend a hand!
> >
> > Cheers,
> > Grant Ingersoll
> >
> > http://lucene.apache.org/mahout
> >
>


-- 
B. Regards,
Edward yoon @ NHN, corp.

Re: Mahout Machine Learning Project Launches

Posted by gopi <go...@gmail.com>.
I'm definitely excited about Machine Learning Algorithms being implemented
into this project!
I'm currently a student studying a Machine Learning, and would love to help
out in every possible manner.

Thanks
Chaitanya Sharma

On Jan 25, 2008 5:55 PM, Grant Ingersoll <gs...@apache.org> wrote:

> (Apologies for cross-posting)
>
> The Lucene PMC is pleased to announce the creation of the Mahout
> Machine Learning project, located at http://lucene.apache.org/mahout.
> Mahout's goal is to create a suite of practical, scalable machine
> learning libraries.  Our initial plan is to utilize Hadoop (
> http://hadoop.apache.org
> ) to implement a variety of algorithms including naive bayes, neural
> networks, support vector machines and k-Means, among others.  While
> our initial focus is on these algorithms, we welcome other machine
> learning ideas as well.
>
> Naturally, we are looking for volunteers to help grow the community
> and make the project successful.  So, if machine learning is your
> thing, come on over and lend a hand!
>
> Cheers,
> Grant Ingersoll
>
> http://lucene.apache.org/mahout
>