You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by Jeff Eastman <je...@windwardsolutions.com> on 2008/03/13 21:30:41 UTC

Mahout Elevator Pitch (Draft)

Here's a short elevator pitch I pulled together from the available project
information to give at the upcoming Hadoop summit on March 25th. Comments
and suggestions are, of course, welcomed.

 

Jeff

 

Mahout

 

Several years ago search engines set out to index the World Wide Web and
make its content searchable. As it became easier for people to add new
information to the Web, the amount of data to index has grown tremendously.
Separating relevant information from spam, learning from users' behavior and
grouping information in meaningful ways have become more and more important
for those interested in utilizing the Web.

 

In recent years a rather large community of researchers has addressed the
problem of extracting useful intelligence from the Web. Whether is it
classifying documents into categories, clustering them to form groups that
make sense to users or ranking them by relevancy given some query, these
methods fall under the broad category of machine learning algorithms.
Unfortunately, most of the available algorithms are either proprietary,
under restrictive licenses or do not scale to massive amounts of
information.

 

Mahout is a new Lucene TLP project to create commercially friendly,
scalable, machine learning algorithms under the Apache license on top of
Hadoop and Hbase. The initial areas of focus are to build out the ten
machine learning libraries detailed in Map-Reduce for Machine Learning on
Multicore, by Chu, Kim, Liu, Yu, Bradski, Ng & Olukotun of the Stanford CS
Department. Though the project is only in its second month, we have an
active and growing community with initial submissions in the areas of
clustering, classification and matrix operations.

 

We chose this name for the project out of admiration and respect for work of
the Hadoop project, whose icon is that of an elephant. According to
Wikipedia, "A mahout is a person who drives an elephant. [.The] Sanskrit
language distinguishes three types: Reghawan, who use love to control their
elephants, Yukthiman, who use ingenuity to outsmart them and Balwan, who
control elephants with cruelty". We intend to practice only in the first two
categories and welcome individuals with similar values who would like to
contribute to the project.

 

Project Committers:

*	Dawid Weiss
*	Erik Hatcher
*	Grant Ingersoll
*	Isabel Drost
*	Karl Wettin
*	Otis Gospodnetic
*	Niranjan Balasubramanian
*	Ozgur Yilmazel

Re: Mahout Elevator Pitch (Draft)

Posted by Grant Ingersoll <gs...@apache.org>.

Sounds good to me...


On Mar 13, 2008, at 4:30 PM, Jeff Eastman wrote:

> Here's a short elevator pitch I pulled together from the available  
> project
> information to give at the upcoming Hadoop summit on March 25th.  
> Comments
> and suggestions are, of course, welcomed.
>
>
>
> Jeff
>
>
>
> Mahout
>
>
>
> Several years ago search engines set out to index the World Wide Web  
> and
> make its content searchable. As it became easier for people to add new
> information to the Web, the amount of data to index has grown  
> tremendously.
> Separating relevant information from spam, learning from users'  
> behavior and
> grouping information in meaningful ways have become more and more  
> important
> for those interested in utilizing the Web.
>
>
>
> In recent years a rather large community of researchers has  
> addressed the
> problem of extracting useful intelligence from the Web. Whether is it
> classifying documents into categories, clustering them to form  
> groups that
> make sense to users or ranking them by relevancy given some query,  
> these
> methods fall under the broad category of machine learning algorithms.
> Unfortunately, most of the available algorithms are either  
> proprietary,
> under restrictive licenses or do not scale to massive amounts of
> information.
>
>
>
> Mahout is a new Lucene TLP project to create commercially friendly,
> scalable, machine learning algorithms under the Apache license on  
> top of
> Hadoop and Hbase. The initial areas of focus are to build out the ten
> machine learning libraries detailed in Map-Reduce for Machine  
> Learning on
> Multicore, by Chu, Kim, Liu, Yu, Bradski, Ng & Olukotun of the  
> Stanford CS
> Department. Though the project is only in its second month, we have an
> active and growing community with initial submissions in the areas of
> clustering, classification and matrix operations.
>
>
>
> We chose this name for the project out of admiration and respect for  
> work of
> the Hadoop project, whose icon is that of an elephant. According to
> Wikipedia, "A mahout is a person who drives an elephant. [.The]  
> Sanskrit
> language distinguishes three types: Reghawan, who use love to  
> control their
> elephants, Yukthiman, who use ingenuity to outsmart them and Balwan,  
> who
> control elephants with cruelty". We intend to practice only in the  
> first two
> categories and welcome individuals with similar values who would  
> like to
> contribute to the project.
>
>
>
> Project Committers:
>
> *	Dawid Weiss
> *	Erik Hatcher
> *	Grant Ingersoll
> *	Isabel Drost
> *	Karl Wettin
> *	Otis Gospodnetic
> *	Niranjan Balasubramanian
> *	Ozgur Yilmazel
>
>
>

--------------------------
Grant Ingersoll
http://www.lucenebootcamp.com
Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ