You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mahout.apache.org by sm...@apache.org on 2016/04/08 23:55:11 UTC

svn commit: r1738312 - in /mahout/site/mahout_cms/trunk: content/users/flinkbindings/playing-with-samsara-flink.mdtext templates/standard.html

Author: smarthi
Date: Fri Apr  8 21:55:11 2016
New Revision: 1738312

URL: http://svn.apache.org/viewvc?rev=1738312&view=rev
Log:
MAHOUT-1765:Mahout DSL for Flink: Add some documentation about Flink backend

Added:
    mahout/site/mahout_cms/trunk/content/users/flinkbindings/playing-with-samsara-flink.mdtext
Modified:
    mahout/site/mahout_cms/trunk/templates/standard.html

Added: mahout/site/mahout_cms/trunk/content/users/flinkbindings/playing-with-samsara-flink.mdtext
URL: http://svn.apache.org/viewvc/mahout/site/mahout_cms/trunk/content/users/flinkbindings/playing-with-samsara-flink.mdtext?rev=1738312&view=auto
==============================================================================
--- mahout/site/mahout_cms/trunk/content/users/flinkbindings/playing-with-samsara-flink.mdtext (added)
+++ mahout/site/mahout_cms/trunk/content/users/flinkbindings/playing-with-samsara-flink.mdtext Fri Apr  8 21:55:11 2016
@@ -0,0 +1,102 @@
+## Getting Started 
+
+To get started, add the following dependency to the pom:
+
+    <dependency>
+      <groupId>org.apache.mahout</groupId>
+      <artifactId>mahout-flink_2.10</artifactId>
+      <version>0.12.0</version>
+    </dependency>
+
+Here is how to use the Flink backend:
+
+	import org.apache.flink.api.java.ExecutionEnvironment
+	import org.apache.mahout.math.drm._
+	import org.apache.mahout.math.drm.RLikeDrmOps._
+	import org.apache.mahout.flinkbindings._
+
+	object ReadCsvExample {
+
+	  def main(args: Array[String]): Unit = {
+	    val filePath = "path/to/the/input/file"
+
+	    val env = ExecutionEnvironment.getExecutionEnvironment
+	    implicit val ctx = new FlinkDistributedContext(env)
+
+	    val drm = readCsv(filePath, delim = "\t", comment = "#")
+	    val C = drm.t %*% drm
+	    println(C.collect)
+	  }
+
+	}
+
+## Current Status
+
+The top JIRA for Flink backend is [MAHOUT-1570](https://issues.apache.org/jira/browse/MAHOUT-1570) which has been fully implemented.
+
+### Implemented
+
+* [MAHOUT-1701](https://issues.apache.org/jira/browse/MAHOUT-1701) Mahout DSL for Flink: implement AtB ABt and AtA operators
+* [MAHOUT-1702](https://issues.apache.org/jira/browse/MAHOUT-1702) implement element-wise operators (like `A + 2` or `A + B`) 
+* [MAHOUT-1703](https://issues.apache.org/jira/browse/MAHOUT-1703) implement `cbind` and `rbind`
+* [MAHOUT-1709](https://issues.apache.org/jira/browse/MAHOUT-1709) implement slicing (like `A(1 to 10, ::)`)
+* [MAHOUT-1710](https://issues.apache.org/jira/browse/MAHOUT-1710) implement right in-core matrix multiplication (`A %*% B` when `B` is in-core) 
+* [MAHOUT-1711](https://issues.apache.org/jira/browse/MAHOUT-1711) implement broadcasting
+* [MAHOUT-1712](https://issues.apache.org/jira/browse/MAHOUT-1712) implement operators `At`, `Ax`, `Atx` - `Ax` and `At` are implemented
+* [MAHOUT-1734](https://issues.apache.org/jira/browse/MAHOUT-1734) implement I/O - should be able to read results of Flink bindings
+* [MAHOUT-1747](https://issues.apache.org/jira/browse/MAHOUT-1747) add support for different types of indexes (String, long, etc) - now supports `Int`, `Long` and `String`
+* [MAHOUT-1748](https://issues.apache.org/jira/browse/MAHOUT-1748) switch to Flink Scala API 
+* [MAHOUT-1749](https://issues.apache.org/jira/browse/MAHOUT-1749) Implement `Atx`
+* [MAHOUT-1750](https://issues.apache.org/jira/browse/MAHOUT-1750) Implement `ABt`
+* [MAHOUT-1751](https://issues.apache.org/jira/browse/MAHOUT-1751) Implement `AtA` 
+* [MAHOUT-1755](https://issues.apache.org/jira/browse/MAHOUT-1755) Flush intermediate results to FS - Flink, unlike Spark, does not store intermediate results in memory.
+* [MAHOUT-1764](https://issues.apache.org/jira/browse/MAHOUT-1764) Add standard backend tests for Flink
+* [MAHOUT-1765](https://issues.apache.org/jira/browse/MAHOUT-1765) Add documentation about Flink backend
+* [MAHOUT-1804](https://issues.apache.org/jira/browse/MAHOUT-1804) Implement drmParallelizeWithRowLabels(..) in Flink
+* [MAHOUT-1805](https://issues.apache.org/jira/browse/MAHOUT-1805) Implement allReduceBlock(..) in Flink bindings
+* [MAHOUT-1809](https://issues.apache.org/jira/browse/MAHOUT-1809) Failing tests in flin-bindings: dals and dspca
+* [MAHOUT-1810](https://issues.apache.org/jira/browse/MAHOUT-1810) Failing test in flink-bindings: A + B Identically partitioned (mapBlock Checkpointing issue)
+* [MAHOUT-1812](https://issues.apache.org/jira/browse/MAHOUT-1812) Implement drmParallelizeWithEmptyLong(..) in flink bindings
+* [MAHOUT-1814](https://issues.apache.org/jira/browse/MAHOUT-1814) Implement drm2intKeyed in flink bindings
+* [MAHOUT-1815](https://issues.apache.org/jira/browse/MAHOUT-1815) dsqDist(X,Y) and dsqDist(X) failing in flink tests
+* [MAHOUT-1816](https://issues.apache.org/jira/browse/MAHOUT-1816) Implement newRowCardinality in CheckpointedFlinkDrm
+* [MAHOUT-1817](https://issues.apache.org/jira/browse/MAHOUT-1817) Implement caching in Flink Bindings
+* [MAHOUT-1818](https://issues.apache.org/jira/browse/MAHOUT-1818) dals test failing in Flink Bindings
+* [MAHOUT-1819](https://issues.apache.org/jira/browse/MAHOUT-1819) Set the default Parallelism for Flink execution in FlinkDistributedContext
+* [MAHOUT-1820](https://issues.apache.org/jira/browse/MAHOUT-1820) Add a method to generate Tuple<PartitionId, Partition elements count>> to support Flink backend
+* [MAHOUT-1821](https://issues.apache.org/jira/browse/MAHOUT-1821) Use a mahout-flink-conf.yaml configuration file for Mahout specific Flink configuration
+* [MAHOUT-1822](https://issues.apache.org/jira/browse/MAHOUT-1822) Update NOTICE.txt, License.txt to add Apache Flink
+* [MAHOUT-1823](https://issues.apache.org/jira/browse/MAHOUT-1823) Modify MahoutFlinkTestSuite to implement FlinkTestBase
+* [MAHOUT-1824](https://issues.apache.org/jira/browse/MAHOUT-1824) Optimize FlinkOpAtA to use upper triangular matrices
+* [MAHOUT-1825](https://issues.apache.org/jira/browse/MAHOUT-1825) Add List of Flink algorithms to Mahout wiki page
+
+### Tests 
+
+There is a set of standard tests that all engines should pass (see [MAHOUT-1764](https://issues.apache.org/jira/browse/MAHOUT-1764)).  
+
+* `DistributedDecompositionsSuite`: `dspca` tests fails and `dals` does not complete (crushes with `OutOfMemoryException`)
+* `DrmLikeOpsSuite`: two `dsqDist` tests fail, the rest pass 
+* `DrmLikeSuite` DFS i/o test doesn't pass (although I/O should work when not using a temp dir)
+* `RLikeDrmOpsSuite`, one of `A + B` tests fails for no obvious reason 
+
+
+These are Flink-backend specific tests, e.g.
+
+* `DrmLikeOpsSuite` for operations like `norm`, `rowSums`, `rowMeans`
+* `RLikeOpsSuite` for basic LA like `A.t %*% A`, `A.t %*% x`, etc
+* `LATestSuite` tests for specific operators like `AtB`, `Ax`, etc
+* `UseCasesSuite` has more complex examples, like power iteration, ridge regression, etc
+
+## Environment 
+
+For development the minimal supported configuration is 
+
+* [JDK 1.7](http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html) (We had problems with 1.8) 
+* [Scala 2.10]
+
+When using mahout, please import the following projects: 
+
+* `mahout-math`
+* `mahout-math-scala`
+* `mahout-flink_2.10`
+* optionally, `mahout-spark_2.10` if you want to get inspired by Spark implementation 

Modified: mahout/site/mahout_cms/trunk/templates/standard.html
URL: http://svn.apache.org/viewvc/mahout/site/mahout_cms/trunk/templates/standard.html?rev=1738312&r1=1738311&r2=1738312&view=diff
==============================================================================
--- mahout/site/mahout_cms/trunk/templates/standard.html (original)
+++ mahout/site/mahout_cms/trunk/templates/standard.html Fri Apr  8 21:55:11 2016
@@ -143,6 +143,7 @@
                 <ul class="dropdown-menu">
                   <li><a href="/users/sparkbindings/home.html">Scala &amp; Spark Bindings Overview</a></li>
                   <li><a href="/users/sparkbindings/faq.html">FAQ</a></li>
+		  <li><a href="/users/flinkbindings/playing-with-samsara-flink.html">Flink Bindings Overview</a></li>
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a href="/users/environment/h2o-internals.html">H2O</a></li>