You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bigtop.apache.org by Suneel Marthi <sm...@apache.org> on 2015/08/07 02:43:53 UTC

[ANNOUNCE] Apache Mahout 0.10.2 Release

The Apache Mahout PMC is pleased to announce the release of Mahout 0.10.2.
Mahout's goal is to create an environment for quickly creating machine
learning applications that scale and run on the highest performance
parallel computation engines available. Mahout comprises an interactive
environment and library that supports generalized scalable linear algebra
and includes many modern machine learning algorithms.

The Mahout Math environment we call “Samsara” for its symbol of universal
renewal. It reflects a fundamental rethinking of how scalable machine
learning algorithms are built and customized. Mahout-Samsara is here to
help people create their own math while providing some off-the-shelf
algorithm implementations. At its base are general linear algebra and
statistical operations along with the data structures to support them. It’s
written in Scala with Mahout-specific extensions, and runs most fully on
Spark.

To get started with Apache Mahout 0.10.2, download the release artifacts
and signatures from http://www.apache.org/dist/mahout/0.10.2/.

Many thanks to the contributors and committers who were part of this
release. Please see below for the Release Highlights.


RELEASE HIGHLIGHTS

This is an incremental minor release over Mahout 0.10.1 meant to introduce
several new features (all of which are also available in the 0.11 lineage)
and fix a few bugs.


Mahout 0.10.2


   1.

   In-core transpose view rewrites. Modifiable transpose views eg. (for
   (col <- a.t) col := 5).
   2.

   Performance and parallelization improvements for AB', A'B, A'A spark
   physical operators.
   3.

   Optional structural "flavor" abstraction for in-core matrices. In-core
   matrices can now be tagged as e.g. sparse or dense.
   4.

   %*% optimization based on matrix flavors.
   5.

   In-core ::= sparse assignment functions.
   6.

   Assign := optimization (do proper traversal based on matrix flavors,
   similarly to %*%).
   7.

   Adding in-place elementwise functional assignment (e.g. mxA := exp _,
   mxA ::= exp _).
   8.

   Distributed and in-core version of simple elementwise analogues of
   scala.math._. for example, for log(x) the convention is dlog(drm),
   mlog(mx), vlog(vec). Unfortunately we cannot overload these functions over
   what is done in scala.math, i.e. scala would not allow log(mx) or log(drm)
   and log(Double) at the same time, mainly because they are being defined in
   different packages.
   9.

   Distributed and in-core first and second moment routines. R analogs:
   mean(), colMeans(), rowMeans(), variance(), sd(). By convention,
   distributed versions are prepended by (d) letter: colMeanVars()
   colMeanStdevs() dcolMeanVars() dcolMeanStdevs().
   10.

   Distance and squared distance matrix routines. R analog: dist(). Provide
   both squared and non-squared Euclidean distance matrices. By convention,
   distributed versions are prepended by (d) letter: dist(x), sqDist(x),
   dsqDist(x). Also a variation for pair-wise distance matrix of two different
   inputs x and y: sqDist(x,y), dsqDist(x,y).
   11.

   DRM row sampling api.
   12.

   Distributed performance bug fixes. This relates mostly to (a) matrix
   multiplication deficiencies, and (b) handling parallelism.
   13.

   Distributed engine neutral allreduceBlock() operator api for Spark and
   H2O.
   14.

   Distributed optimizer operators for elementwise functions. Rewrites
   recognizing e.g. 1+ drmX * dexp(drmX) as a single fused elementwise
   physical operator: elementwiseFunc(f1(f2(drmX)) where f1 = 1 + x and f2 =
   exp(x).
   15.

   More cbind, rbind flavors (e.g. 1 cbind mxX, 1 cbind drmX or the other
   way around) for Spark and H2O.
   16.

   Added +=: and *=: operators on vectors.
   17.

   Closeable API for broadcast tensors.
   18.

   Support for conversion of any type-keyed DRM into ordinally-keyed DRM.
   19.

   Scala logging style.
   20.

   rowSumsMap() summary for non-int-keyed DRMs.
   21.

   elementwise power operator ^ .
   22.

   R-like vector concatenation operator.
   23.

   In-core functional assignments e.g.: mxA := { (x) => x * x}.
   24.

   Straighten out behavior of Matrix.iterator() and iterateNonEmpty().
   25.

   New mutable transposition view for in-core matrices.  In-core matrix
   transpose view. rewrite with mostly two goals in mind: (1) enable
   mutability, e.g. for (col <- mxA.t) col := k (2) translate matrix
   structural flavor for optimizers correctly. i.e. new SparseRowMatrix.t
   carries on as column-major structure.
   26.

   Native support for kryo serialization of tensor types.
   27.

   Deprecation of the MultiLayerPerceptron, ConcatenateVectorsJob and all
   related classes.
   28.

   Deprecation of SparseColumnMatrix.





STATS

A total of 31 separate JIRA issues are addressed in this release [2] with 2
bugfixes.

Mahout 0.11.0-snapshot (targeted for August 5, 2015)

   1.

   Support for Spark 1.3 sequence file write.
   2.

   Mahout Spark shell support for Spark 1.3.
   3.

   Ongoing work on integration of Apache Flink as a backend.


GETTING STARTED

Download the release artifacts and signatures at
http://www.apache.org/dist/mahout/0.10.2/
The examples directory contains several working examples of the core
functionality available in Mahout. These can be run via scripts in the
examples/bin directory. Most examples do not need a Hadoop cluster in order
to run.

FUTURE PLANS

We intend this as a final release of our mahout-0.10.x branch, which will
remain dependent on Spark 1.2.x. Support for Spark 1.3 is in the master
branch reflecting Mahout-0.11.0-SNAPSHOT. To see progress on this branch
look here: https://github.com/apache/mahout/commits/master.  As of this
writing we are working on a parallel effort to put out a release of Mahout
0.11.0 which can be built for Spark 1.3.

Integration with Apache Flink is in the works in collaboration with TU
Berlin and Data Artisans to add Flink as the 3rd execution engine to
Mahout. This would be in addition to existing Apache Spark and H2O engines.

KNOWN ISSUES

In the non-source zip or tar, the example data for
mahout/examples/bin/run-item-sim is missing. To run it get the csv files
from Github
<https://github.com/apache/mahout/tree/mahout-0.10.x/examples/src/main/resources>
[4].