You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Suneel Marthi <sm...@apache.org> on 2016/04/11 23:42:17 UTC

[ANNOUNCE] Apache Mahout 0.12.0 Release

The Apache Mahout PMC is pleased to announce the release of Mahout 0.12.0.

Mahout's goal is to create an environment for quickly creating machine
learning applications that scale and run on the highest performance
parallel computation engines available. Mahout comprises an interactive
environment and library that supports generalized scalable linear algebra
and includes many modern machine learning algorithms.

The Mahout Math environment we call “Samsara” for its symbol of universal
renewal. It reflects a fundamental rethinking of how scalable machine
learning algorithms are built and customized. Mahout-Samsara is here to
help people create their own math while providing some off-the-shelf
algorithm implementations. At its base are general linear algebra and
statistical operations along with the data structures to support them. It’s
written in Scala with Mahout-specific extensions, and runs on Spark, Flink
and H2O.

The Mahout 0.12.0 release marks a major milestone for the “Samsara”
environment’s goal of providing an engine neutral math platform by now
supporting Flink.  While still experimental, the mahout Flink bindings now
offer all of the R-Like semantics for linear algebra operations, matrix
decompositions, and algorithms of the “Samsara” platform for execution on a
Flink back-end.

This gives users of Flink out of the box access to the following features
(and more):


   1.

   The Mahout Distributed Row Matrix (DRM) API.
   2.

   Distributed and local Vector and Matrix algebra routines.
   3.

   Distributed and local Stochastic Principal Component Analysis.
   4.

   Distributed and local Stochastic Singular Value Decomposition.
   5.

   Distributed and local Thin QR Decomposition.
   6.

   Collaborative Filtering.
   7.

   Naive Bayes Classification.
   8.

   Matrix operations (only listing a few here):
   1.

      Mahout-native blockified distributed Matrix map and allreduce
      routines.
      2.

      Distributed data point (row) sampling.
      3.

      Matrix/Matrix Squared Distance.
      4.

      Element-wise log.
      5.

      Element-wise roots.
      6.

      Element-wise Matrix/Matrix addition, subtraction, division and
      multiplication.
      7.

      Functional Matrix value assignment.
      9.

   A familiar Scala-based R-like DSL.


As well as tools to develop other mathematical and machine learning
algorithms.

To get started with Apache Mahout 0.12.0, download the release artifacts
and signatures from <http://www.apache.org/dist/mahout/0.11.2/>
http://www.apache.org/dist/mahout/0.12.0/.

Many thanks to the contributors and committers who were part of this
release. Thanks in particular to Till Rohrmann, Alexey Grigorev, Robert
Metzger, Stephan Ewen, and Kostas Tzoumas, members of Data Artisans and the
Flink community who helped in this effort significantly. Please see below
for the Release Highlights.

RELEASE HIGHLIGHTS

This is a major release over Mahout 0.11.2 meant to introduce Apache Flink (
http://flink.apache.org) as a backend execution engine to the Samsara
Linear Algebra framework.

For more information about “Samsara” on Flink see: (
http://mahout.apache.org/users/flinkbindings/flink-internals.html) and (
http://mahout.apache.org/users/flinkbindings/playing-with-samsara-flink.html
)

Mahout 0.12.0 is based on Apache Flink 1.0.1 (
http://flink.apache.org/news/2016/04/06/release-1.0.1.html
<http://flink.apache.org/news/2016/04/06/release-1.0.1.html)>)

 Mahout 0.12.0 now supports Flink 1.0.1 and Spark 1.5.2 on Hadoop 2.4.1.

KNOWN ISSUES

   1.

   Mahout’s DRM checkpointing is not fully supported in this release and
   the DrmLike.checkpoint(CacheHint.CacheHint) contract is broken.  Currently
   checkpoints are cached to a temporary file system as designated by the
   `taskmanager.tmp.dirs` property in the
   `$MAHOUT_HOME/conf/flink-config.yaml` file.  This Issue affects the
   performance of Mahout on Flink.
   2.

   Serialization issues have arisen with certain operations. As the Flink
   Bindings are still experimental, we’ve allowed these issues to pass the
   release, and will be addressing them in a follow up 0.12.1 maintenance
   release.  These issues affect the performance of Mahout on Flink.
   3.

   Highly iterative Mahout algorithms are currently significantly slowed by
   issue (1).



Fixed Jiras:

This release addresses 35 issues [1] of which 14 are bug fixes [2].

Future Roadmap:

1. Mahout 0.12.1 will support a Flink shell.

2. Several optimizations will be made to the Mahout Flink-Bindings in
Mahout 0.12.1, specifically to overcome the performance issues noted in the
Known Issues section above.

3. We will be exploring native Mahout caching for Flink.

4. Explore leveraging ViennaCL (
http://viennacl.sourceforge.net/doc/manual-license.html) as a math backend
to support Dense, sparse and Cuda computations on bare metal.



[1]
https://issues.apache.org/jira/browse/MAHOUT-1828?jql=project%20%3D%20MAHOUT%20AND%20fixVersion%20%3D%200.12.0%20AND%20status%20%3D%20Resolved

[2]
https://issues.apache.org/jira/browse/MAHOUT-1828?jql=project%20%3D%20MAHOUT%20AND%20fixVersion%20%3D%200.12.0%20AND%20Type%20%3D%20Bug%20AND%20status%20%3D%20Resolved

[3]http://mahout.apache.org/users/flinkbindings/flink-internals.html

[4]
http://mahout.apache.org/users/flinkbindings/playing-with-samsara-flink.html


Regards,
On behalf of Apache Mahout PMC