You are viewing a plain text version of this content. The canonical link for it is here.

Posted to announce@apache.org by Suneel Marthi <sm...@apache.org> on 2016/03/12 20:20:56 UTC

[ANNOUNCE] Apache Mahout 0.11.2 Release

Apache Mahout 0.11.2 Release Notes

The Apache Mahout PMC is pleased to announce the release of Mahout 0.11.2.
Mahout's goal is to create an environment for quickly creating machine
learning applications that scale and run on the highest performance
parallel computation engines available. Mahout comprises an interactive
environment and library that supports generalized scalable linear algebra
and includes many modern machine learning algorithms.

The Mahout Math environment we call “Samsara” for its symbol of universal
renewal. It reflects a fundamental rethinking of how scalable machine
learning algorithms are built and customized. Mahout-Samsara is here to
help people create their own math while providing some off-the-shelf
algorithm implementations. At its base are general linear algebra and
statistical operations along with the data structures to support them. It’s
written in Scala with Mahout-specific extensions, and runs on Spark and
H2O.

To get started with Apache Mahout 0.11.2, download the release artifacts
and signatures from http://www.apache.org/dist/mahout/0.11.2/.

Many thanks to the contributors and committers who were part of this
release. Please see below for the Release Highlights.

RELEASE HIGHLIGHTS

This is a minor release over Mahout 0.11.0 meant to introduce major
performance enhancements with sparse matrix and vector computations, and
major performance optimizations to the Samsara DSL. Mahout 0.11.2 includes
all new features and bug fixes released in Mahout versions 0.11.0 and
0.11.1.

Mahout 0.11.2 new features compared to Mahout 0.11.1.

Spark 1.5.2 support.
2.

Performance improvements of over 30% on Sparse Vector and Matrix
computations leveraging the ‘fastutil’ library - contribution from
Sebastiano Vigna. This speeds up all in-core sparse vector and matrix
computations.
3.

KNOWN ISSUES

The dataset URLs in the Wikipedia Naive Bayes classification example script
(/examples/bin/classify-wikipedia.sh) have changed. The new URL for the
smallest set is:

http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles1.xml-p000000010p000030302.bz2

and for the medium set:

http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles10.xml-p002336425p003046511.bz2

To run the Wikipedia classification example, simply switch out the old URLs
with the new in classify-wikipedia.sh.

Fixed Jiras:

MAHOUT-1640: Better collections would significantly improve
vector-operation speed

MAHOUT-1800: Pare down Classtag overuse

MAHOUT-1801: FastUtil to improve speed of Sparse Matrix Operations

MAHOUT-1802: Capture attached checkpoints (if cached)

Future Roadmap:

1. Mahout 0.12.0 will be released soon and would have Apache Flink as a
supported backend execution engine.

2. Explore leveraging ViennaCL (
http://viennacl.sourceforge.net/doc/manual-license.html) as a math backend
to support Dense, sparse and Cuda computations on bare metal.