You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Grant Ingersoll <gs...@apache.org> on 2013/07/25 16:35:45 UTC
Apache Mahout 0.8 Released
The Apache Mahout PMC is pleased to announce the release of Mahout 0.8.
Mahout's goal is to build scalable machine learning libraries focused
primarily in the areas of collaborative filtering (recommenders),
clustering and classification (known collectively as the "3Cs"), as well as the
necessary infrastructure to support those implementations including, but
not limited to, math packages for statistics, linear algebra and others
as well as Java primitive collections, local and distributed vector and
matrix classes and a variety of integrative code to work with popular
packages like Apache Hadoop, Apache Lucene, Apache HBase, Apache
Cassandra and much more. The 0.8 release is mainly a clean up release in
preparation for an upcoming 1.0 release, but there are several
significant new features, which are highlighted below.
To get started with Apache Mahout 0.8, download the release artifacts and signatures at http://www.apache.org/dyn/closer.cgi/mahout or visit the central Maven repository.
In addition to the release highlights and artifacts, please pay attention to the section labelled FUTURE PLANS below for more information about upcoming releases of Mahout.
As with any release, we wish to thank all of the users and contributors
to Mahout. Please see the CHANGELOG [1] and JIRA Release Notes [2] for
individual credits, as there are too many to list here.
GETTING STARTED
In the release package, the examples directory contains several working examples of the core
functionality available in Mahout. These can be run via scripts in the
examples/bin directory and will prompt you for more information to help you try things out. Most examples do not need a Hadoop cluster in
order to run.
RELEASE HIGHLIGHTS
The highlights of the Apache Mahout 0.8 release include, but are not
limited to the list below. For further information, see the included
CHANGELOG file.
- Numerous performance improvements to Vector and Matrix
implementations, API's and their iterators (see also MAHOUT-1192,
MAHOUT-1202)
- Numerous performance improvements to the recommender implementations
(see also MAHOUT-1272, MAHOUT-1035, MAHOUT-1042, MAHOUT-1151,
MAHOUT-1166, MAHOUT-1167, MAHOUT-1169, MAHOUT-1205, MAHOUT-1264)
- MAHOUT-1088: Support for biased item-based recommender
- MAHOUT-1089: SGD matrix factorization for rating prediction with user and item biases
- MAHOUT-1106: Support for SVD++
- MAHOUT-944: Support for converting one or more Lucene storage indexes
to SequenceFiles as well as an upgrade of the supported Lucene version
to Lucene 4.3.1.
- MAHOUT-1154 and friends: New streaming k-means implementation that offers on-line (and fast) clustering
- MAHOUT-833: Make conversion to SequenceFiles Map-Reduce, 'seqdirectory' can now be run as a MapReduce job.
- MAHOUT-1052: Add an option to MinHashDriver that specifies the dimension of vector to hash (indexes or values).
- MAHOUT-884: Matrix Concat utility, presently only concatenates two matrices.
- MAHOUT-1244: Upgraded to use Lucene 4.3
- MAHOUT-1187: Upgraded to CommonsLang3
- MAHOUT-916: Speedup the Mahout build by making tests run in parallel.
- The usual bug fixes. See JIRA [2] for more
information on the 0.8 release.
A total of 218 separate JIRA issues are addressed in this release.
CONTRIBUTING
Mahout is always looking for contributions focused on the 3Cs. If you are interested in contributing, please see our contribution page, https://cwiki.apache.org/MAHOUT/how-to-contribute.html, on the Mahout wiki or contact us via email at dev@mahout.apache.org.
FUTURE PLANS
0.9
As the project moves towards a 1.0 release, the community is working to
clean up and/or remove parts of the code base that are under-supported
or that underperform as well as to better focus the energy and
contributions on key algorithms that are proven to scale in production
and have seen wide-spread adoption. To this end, in the next release,
the project is planning on removing support for the following algorithms
unless there is sustained support and improvement of them before the
next release.
The algorithms to be removed are:
- From Clustering:
Dirichlet
MeanShift
MinHash
Eigencuts
- From Classification (both are sequential implementations)
Winnow
Perceptron
- Frequent Pattern Mining
- Collaborative Filtering
All recommenders in org.apache.mahout.cf.taste.
impl.recommender.knn
SlopeOne implementations in org.apache.mahout.cf.taste.hadoop.slopeone and org.apache.mahout.cf.taste.impl.recommender.slopeone
Distributed pseudo recommender in org.apache.mahout.cf.taste.hadoop.pseudo
TreeClusteringRecommender in org.apache.mahout.cf.taste.impl.recommender
- Mahout Math
Lanczos in favour of SSVD
Hadoop entropy stuff in org.apache.mahout.math.stats.entropy
If you are interested in supporting 1 or more of these algorithms, please make it known on dev@mahout.apache.org and via JIRA issues that fix and/or improve them. Please also provide
supporting evidence as to their effectiveness for you in production.
1.0 PLANS
Our plans as a community are to focus 0.9 on cleanup of bugs and the
removal of the code mentioned above and then to follow with a 1.0
release soon thereafter, at which point the community is committing to
the support of the algorithms packaged in the 1.0 for at least two minor
versions after their release. In the case of removal after 1.0, we will deprecate
the functionality in the 1.(x+1) minor release and remove it in the
1.(x+2) release. For instance, if feature X is to be removed after the
1.2 release, it will be deprecated in 1.3 and removed in 1.4.
[1] http://svn.apache.org/viewvc/mahout/trunk/CHANGELOG?revision=1501110&view=markup
[2] https://issues.apache.org/jira/issues/?jql=project%20%3D%20MAHOUT%20AND%20fixVersion%20%3D%20%220.8%22]
Re: Apache Mahout 0.8 Released
Posted by Suneel Marthi <su...@yahoo.com>.
Could someone with Committer access please mark all JIRAs that were fixed in 0.8 as 'Closed'?
I am not sure how to do it as one Bulk operation and not spamming rest of the world with a flood of emails.
Thanks.
________________________________
From: Grant Ingersoll <gs...@apache.org>
To: "user@mahout.apache.org" <us...@mahout.apache.org>; "dev@mahout.apache.org" <de...@mahout.apache.org>; announce@apache.org
Sent: Thursday, July 25, 2013 10:35 AM
Subject: Apache Mahout 0.8 Released
The Apache Mahout PMC is pleased to announce the release of Mahout 0.8.
Mahout's goal is to build scalable machine learning libraries focused
primarily in the areas of collaborative filtering (recommenders),
clustering and classification (known collectively as the "3Cs"), as well as the
necessary infrastructure to support those implementations including, but
not limited to, math packages for statistics, linear algebra and others
as well as Java primitive collections, local and distributed vector and
matrix classes and a variety of integrative code to work with popular
packages like Apache Hadoop, Apache Lucene, Apache HBase, Apache
Cassandra and much more. The 0.8 release is mainly a clean up release in
preparation for an upcoming 1.0 release, but there are several
significant new features, which are highlighted below.
To get started with Apache Mahout 0.8, download the release artifacts and signatures at http://www.apache.org/dyn/closer.cgi/mahout or visit the central Maven repository.
In addition to the release highlights and artifacts, please pay attention to the section labelled FUTURE PLANS below for more information about upcoming releases of Mahout.
As with any release, we wish to thank all of the users and contributors
to Mahout. Please see the CHANGELOG [1] and JIRA Release Notes [2] for
individual credits, as there are too many to list here.
GETTING STARTED
In the release package, the examples directory contains several working examples of the core
functionality available in Mahout. These can be run via scripts in the
examples/bin directory and will prompt you for more information to help you try things out. Most examples do not need a Hadoop cluster in
order to run.
RELEASE HIGHLIGHTS
The highlights of the Apache Mahout 0.8 release include, but are not
limited to the list below. For further information, see the included
CHANGELOG file.
- Numerous performance improvements to Vector and Matrix
implementations, API's and their iterators (see also MAHOUT-1192,
MAHOUT-1202)
- Numerous performance improvements to the recommender implementations
(see also MAHOUT-1272, MAHOUT-1035, MAHOUT-1042, MAHOUT-1151,
MAHOUT-1166, MAHOUT-1167, MAHOUT-1169, MAHOUT-1205, MAHOUT-1264)
- MAHOUT-1088: Support for biased item-based recommender
- MAHOUT-1089: SGD matrix factorization for rating prediction with user and item biases
- MAHOUT-1106: Support for SVD++
- MAHOUT-944: Support for converting one or more Lucene storage indexes
to SequenceFiles as well as an upgrade of the supported Lucene version
to Lucene 4.3.1.
- MAHOUT-1154 and friends: New streaming k-means implementation that offers on-line (and fast) clustering
- MAHOUT-833: Make conversion to SequenceFiles Map-Reduce, 'seqdirectory' can now be run as a MapReduce job.
- MAHOUT-1052: Add an option to MinHashDriver that specifies the dimension of vector to hash (indexes or values).
- MAHOUT-884: Matrix Concat utility, presently only concatenates two matrices.
- MAHOUT-1244: Upgraded to use Lucene 4.3
- MAHOUT-1187: Upgraded to CommonsLang3
- MAHOUT-916: Speedup the Mahout build by making tests run in parallel.
- The usual bug fixes. See JIRA [2] for more
information on the 0.8 release.
A total of 218 separate JIRA issues are addressed in this release.
CONTRIBUTING
Mahout is always looking for contributions focused on the 3Cs. If you are interested in contributing, please see our contribution page, https://cwiki.apache.org/MAHOUT/how-to-contribute.html, on the Mahout wiki or contact us via email at dev@mahout.apache.org.
FUTURE PLANS
0.9
As the project moves towards a 1.0 release, the community is working to
clean up and/or remove parts of the code base that are under-supported
or that underperform as well as to better focus the energy and
contributions on key algorithms that are proven to scale in production
and have seen wide-spread adoption. To this end, in the next release,
the project is planning on removing support for the following algorithms
unless there is sustained support and improvement of them before the
next release.
The algorithms to be removed are:
- From Clustering:
Dirichlet
MeanShift
MinHash
Eigencuts
- From Classification (both are sequential implementations)
Winnow
Perceptron
- Frequent Pattern Mining
- Collaborative Filtering
All recommenders in org.apache.mahout.cf.taste.
impl.recommender.knn
SlopeOne implementations in org.apache.mahout.cf.taste.hadoop.slopeone and org.apache.mahout.cf.taste.impl.recommender.slopeone
Distributed pseudo recommender in org.apache.mahout.cf.taste.hadoop.pseudo
TreeClusteringRecommender in org.apache.mahout.cf.taste.impl.recommender
- Mahout Math
Lanczos in favour of SSVD
Hadoop entropy stuff in org.apache.mahout.math.stats.entropy
If you are interested in supporting 1 or more of these algorithms, please make it known on dev@mahout.apache.org and via JIRA issues that fix and/or improve them. Please also provide
supporting evidence as to their effectiveness for you in production.
1.0 PLANS
Our plans as a community are to focus 0.9 on cleanup of bugs and the
removal of the code mentioned above and then to follow with a 1.0
release soon thereafter, at which point the community is committing to
the support of the algorithms packaged in the 1.0 for at least two minor
versions after their release. In the case of removal after 1.0, we will deprecate
the functionality in the 1.(x+1) minor release and remove it in the
1.(x+2) release. For instance, if feature X is to be removed after the
1.2 release, it will be deprecated in 1.3 and removed in 1.4.
[1] http://svn.apache.org/viewvc/mahout/trunk/CHANGELOG?revision=1501110&view=markup
[2] https://issues.apache.org/jira/issues/?jql=project%20%3D%20MAHOUT%20AND%20fixVersion%20%3D%20%220.8%22]