You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@lucene.apache.org by Grant Ingersoll <gs...@apache.org> on 2009/11/18 14:34:48 UTC

[ANN] Apache Mahout 0.2 Released

Apache Mahout 0.2 has been released and is now available for public
download at http://www.apache.org/dyn/closer.cgi/lucene/mahout

Apache Mahout is a subproject of Apache Lucene with the goal
of delivering scalable machine learning algorithm implementations
under the Apache license. http://www.apache.org/licenses/LICENSE-2.0
Scale in terms of computation to the
size of data you manage today.  Scale in terms of community to support anyone 
interested in using machine learning. Scale
in terms of business by providing the library under a commercially
friendly, free software license. 

Built on top of the powerful map/reduce paradigm of the Apache Hadoop
project, Mahout's goal is to solve popular machine learning problems
like clustering, collaborative filtering and classification
over extremely large data sets over thousands of computers.

Up to date maven artifacts can be found in the Apache repository at
https://repository.apache.org/content/repositories/releases/org/apache/mahout/

The complete changelist can be found here:
http://issues.apache.org/jira/browse/MAHOUT/fixforversion/12313278

New Mahout 0.2 features include

- Major performance enhancements in Collaborative Filtering,
Classification and Clustering
- New: Latent Dirichlet Allocation(LDA) implementation for topic
modelling
- New: Frequent Itemset Mining for mining top-k patterns from a list
of transactions
- New: Decision Forests implementation for Decision Tree classification
(In Memory & Partial Data)
- New: HBase storage support for Naive Bayes model building and
classification
- New: Generation of vectors from Text documents for use with Mahout
Algorithms
- Performance improvements in various Vector implementations
- Tons of bug fixes and code cleanup

Getting started: New to Mahout? 

1) Download Mahout at http://www.apache.org/dyn/closer.cgi/lucene/mahout
2) Check out the Quick start:
http://cwiki.apache.org/MAHOUT/quickstart.html 

3) Read the Mahout Wiki: http://cwiki.apache.org/MAHOUT
4) Join the community by subscribing to mahout-user@lucene.apache.org
5) Give back: http://www.apache.org/foundation/getinvolved.html (optional, but much appreciated!)
6) Consider adding yourself to the power by Wiki page:
http://cwiki.apache.org/MAHOUT/poweredby.html

For more information on Apache Mahout, see
http://lucene.apache.org/mahout

Re: RE: [ANN] Apache Mahout 0.2 Released

Posted by Sean Owen <sr...@gmail.com>.

I believe the code is written with the old deprecated 0.19.x APIs, so
should work on both 0.19 and 0.20 which merely deprecates them.

The exception is the collaborative filtering stuff, which is in a bit
of limbo, since it uses the new 0.20 APIs, but doesn't work due to
some apparent problems in the new APIs. If that's what you're after
let's talk.

On Wed, Nov 18, 2009 at 8:58 PM, Patterson, Josh <jp...@tva.gov> wrote:
> I've got a smaller pseudo-distributed hadoop 0.20 cluster that I use to play with HBase on; we also have a 0.19 fully distributed dev cluster that we use for prototyping map reduce jobs on --- other than MR api changes, is there any reasons why it wouldn’t run on hadoop 0.19?
>
> I'm leaning towards the 0.20 cluster, just wanted to check.
>
> Josh Patterson
> TVA
>
> -----Original Message-----
> From: Sean Owen [mailto:srowen@gmail.com]
> Sent: Wednesday, November 18, 2009 3:46 PM
> To: mahout-user@lucene.apache.org
> Subject: Re: RE: [ANN] Apache Mahout 0.2 Released
>
> 0.20 is what we have been using.
>
> On Nov 18, 2009 8:23 PM, "Patterson, Josh" <jp...@tva.gov> wrote:
>
> Which versions of Hadoop will Mahout 0.2 run on?
>
> Thanks!
>
> Josh Patterson
> TVA
>
> -----Original Message----- From: Grant Ingersoll [mailto:gsingers@apache.org]
> Sent: Wednesday, Nov...
>

RE: RE: [ANN] Apache Mahout 0.2 Released

Posted by "Patterson, Josh" <jp...@tva.gov>.

I've got a smaller pseudo-distributed hadoop 0.20 cluster that I use to play with HBase on; we also have a 0.19 fully distributed dev cluster that we use for prototyping map reduce jobs on --- other than MR api changes, is there any reasons why it wouldn’t run on hadoop 0.19?

I'm leaning towards the 0.20 cluster, just wanted to check.

Josh Patterson
TVA

-----Original Message-----
From: Sean Owen [mailto:srowen@gmail.com] 
Sent: Wednesday, November 18, 2009 3:46 PM
To: mahout-user@lucene.apache.org
Subject: Re: RE: [ANN] Apache Mahout 0.2 Released

0.20 is what we have been using.

On Nov 18, 2009 8:23 PM, "Patterson, Josh" <jp...@tva.gov> wrote:

Which versions of Hadoop will Mahout 0.2 run on?

Thanks!

Josh Patterson
TVA

-----Original Message----- From: Grant Ingersoll [mailto:gsingers@apache.org]
Sent: Wednesday, Nov...

Re: RE: [ANN] Apache Mahout 0.2 Released

Posted by Sean Owen <sr...@gmail.com>.

0.20 is what we have been using.

On Nov 18, 2009 8:23 PM, "Patterson, Josh" <jp...@tva.gov> wrote:

Which versions of Hadoop will Mahout 0.2 run on?

Thanks!

Josh Patterson
TVA

-----Original Message----- From: Grant Ingersoll [mailto:gsingers@apache.org]
Sent: Wednesday, Nov...

RE: [ANN] Apache Mahout 0.2 Released

Posted by "Patterson, Josh" <jp...@tva.gov>.

Which versions of Hadoop will Mahout 0.2 run on?

Thanks!

Josh Patterson
TVA

-----Original Message-----
From: Grant Ingersoll [mailto:gsingers@apache.org] 
Sent: Wednesday, November 18, 2009 8:35 AM
To: mahout-user@lucene.apache.org; Mahout Dev List;
general@lucene.apache.org; announce@apache.org
Subject: [ANN] Apache Mahout 0.2 Released

Apache Mahout 0.2 has been released and is now available for public
download at http://www.apache.org/dyn/closer.cgi/lucene/mahout

Apache Mahout is a subproject of Apache Lucene with the goal
of delivering scalable machine learning algorithm implementations
under the Apache license. http://www.apache.org/licenses/LICENSE-2.0
Scale in terms of computation to the
size of data you manage today.  Scale in terms of community to support
anyone 
interested in using machine learning. Scale
in terms of business by providing the library under a commercially
friendly, free software license. 

Built on top of the powerful map/reduce paradigm of the Apache Hadoop
project, Mahout's goal is to solve popular machine learning problems
like clustering, collaborative filtering and classification
over extremely large data sets over thousands of computers.

Up to date maven artifacts can be found in the Apache repository at
https://repository.apache.org/content/repositories/releases/org/apache/m
ahout/

The complete changelist can be found here:
http://issues.apache.org/jira/browse/MAHOUT/fixforversion/12313278

New Mahout 0.2 features include

- Major performance enhancements in Collaborative Filtering,
Classification and Clustering
- New: Latent Dirichlet Allocation(LDA) implementation for topic
modelling
- New: Frequent Itemset Mining for mining top-k patterns from a list
of transactions
- New: Decision Forests implementation for Decision Tree classification
(In Memory & Partial Data)
- New: HBase storage support for Naive Bayes model building and
classification
- New: Generation of vectors from Text documents for use with Mahout
Algorithms
- Performance improvements in various Vector implementations
- Tons of bug fixes and code cleanup

Getting started: New to Mahout? 

1) Download Mahout at http://www.apache.org/dyn/closer.cgi/lucene/mahout
2) Check out the Quick start:
http://cwiki.apache.org/MAHOUT/quickstart.html 

3) Read the Mahout Wiki: http://cwiki.apache.org/MAHOUT
4) Join the community by subscribing to mahout-user@lucene.apache.org
5) Give back: http://www.apache.org/foundation/getinvolved.html
(optional, but much appreciated!)
6) Consider adding yourself to the power by Wiki page:
http://cwiki.apache.org/MAHOUT/poweredby.html

For more information on Apache Mahout, see
http://lucene.apache.org/mahout