You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@systemml.apache.org by de...@apache.org on 2017/02/17 00:16:52 UTC

[1/3] incubator-systemml git commit: [SYSTEMML-1193] Update perftest runNaiveBayes.sh and doc for required probabilities parameter in naive-bayes-predict.dml

Repository: incubator-systemml
Updated Branches:
  refs/heads/gh-pages 452a41a02 -> bb97a4bc6


[SYSTEMML-1193] Update perftest runNaiveBayes.sh and doc for required probabilities parameter in naive-bayes-predict.dml

Closes #353.


Project: http://git-wip-us.apache.org/repos/asf/incubator-systemml/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-systemml/commit/0f92f401
Tree: http://git-wip-us.apache.org/repos/asf/incubator-systemml/tree/0f92f401
Diff: http://git-wip-us.apache.org/repos/asf/incubator-systemml/diff/0f92f401

Branch: refs/heads/gh-pages
Commit: 0f92f40182f32e7cf533e5cea28de7b6759666f3
Parents: 452a41a
Author: Glenn Weidner <gw...@us.ibm.com>
Authored: Mon Feb 13 14:56:36 2017 -0800
Committer: Glenn Weidner <gw...@us.ibm.com>
Committed: Mon Feb 13 14:56:36 2017 -0800

----------------------------------------------------------------------
 algorithms-classification.md | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/0f92f401/algorithms-classification.md
----------------------------------------------------------------------
diff --git a/algorithms-classification.md b/algorithms-classification.md
index 8d19d04..0ee43bf 100644
--- a/algorithms-classification.md
+++ b/algorithms-classification.md
@@ -1236,8 +1236,7 @@ val prediction = model.transform(X_test_df)
 SystemML Language Reference for details.
 
 **probabilities**: Location (on HDFS) to store class membership
-    probabilities for a held-out test set. Note that this is an
-    optional argument.
+    probabilities for a held-out test set.
 
 **accuracy**: Location (on HDFS) to store the training accuracy during
     learning and testing accuracy from a held-out test set

[3/3] incubator-systemml git commit: [SYSTEMML-1279] Decrease numCols to prevent spark codegen issue

Posted by de...@apache.org.

[SYSTEMML-1279] Decrease numCols to prevent spark codegen issue

Closes #395.


Project: http://git-wip-us.apache.org/repos/asf/incubator-systemml/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-systemml/commit/bb97a4bc
Tree: http://git-wip-us.apache.org/repos/asf/incubator-systemml/tree/bb97a4bc
Diff: http://git-wip-us.apache.org/repos/asf/incubator-systemml/diff/bb97a4bc

Branch: refs/heads/gh-pages
Commit: bb97a4bc6213cf68eeea91097a71d1fd149c49ec
Parents: ba2819b
Author: Felix Schueler <fe...@ibm.com>
Authored: Thu Feb 16 16:13:14 2017 -0800
Committer: Deron Eriksson <de...@us.ibm.com>
Committed: Thu Feb 16 16:13:14 2017 -0800

----------------------------------------------------------------------
 spark-mlcontext-programming-guide.md | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/bb97a4bc/spark-mlcontext-programming-guide.md
----------------------------------------------------------------------
diff --git a/spark-mlcontext-programming-guide.md b/spark-mlcontext-programming-guide.md
index e5df11f..c15c27f 100644
--- a/spark-mlcontext-programming-guide.md
+++ b/spark-mlcontext-programming-guide.md
@@ -124,7 +124,7 @@ None
 
 ## DataFrame Example
 
-For demonstration purposes, we'll use Spark to create a `DataFrame` called `df` of random `double`s from 0 to 1 consisting of 10,000 rows and 1,000 columns.
+For demonstration purposes, we'll use Spark to create a `DataFrame` called `df` of random `double`s from 0 to 1 consisting of 10,000 rows and 100 columns.
 
 <div class="codetabs">
 
@@ -134,7 +134,7 @@ import org.apache.spark.sql._
 import org.apache.spark.sql.types.{StructType,StructField,DoubleType}
 import scala.util.Random
 val numRows = 10000
-val numCols = 1000
+val numCols = 100
 val data = sc.parallelize(0 to numRows-1).map { _ => Row.fromSeq(Seq.fill(numCols)(Random.nextDouble)) }
 val schema = StructType((0 to numCols-1).map { i => StructField("C" + i, DoubleType, true) } )
 val df = spark.createDataFrame(data, schema)
@@ -155,8 +155,8 @@ import scala.util.Random
 scala> val numRows = 10000
 numRows: Int = 10000
 
-scala> val numCols = 1000
-numCols: Int = 1000
+scala> val numCols = 100
+numCols: Int = 100
 
 scala> val data = sc.parallelize(0 to numRows-1).map { _ => Row.fromSeq(Seq.fill(numCols)(Random.nextDouble)) }
 data: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = MapPartitionsRDD[1] at map at <console>:42
@@ -175,7 +175,7 @@ df: org.apache.spark.sql.DataFrame = [C0: double, C1: double, C2: double, C3: do
 We'll create a DML script to find the minimum, maximum, and mean values in a matrix. This
 script has one input variable, matrix `Xin`, and three output variables, `minOut`, `maxOut`, and `meanOut`.
 
-For performance, we'll specify metadata indicating that the matrix has 10,000 rows and 1,000 columns.
+For performance, we'll specify metadata indicating that the matrix has 10,000 rows and 100 columns.
 
 We'll create a DML script using the ScriptFactory `dml` method with the `minMaxMean` script String. The
 input variable is specified to be our `DataFrame` `df` with `MatrixMetadata` `mm`. The output
@@ -218,7 +218,7 @@ meanOut = mean(Xin)
 "
 
 scala> val mm = new MatrixMetadata(numRows, numCols)
-mm: org.apache.sysml.api.mlcontext.MatrixMetadata = rows: 10000, columns: 1000, non-zeros: None, rows per block: None, columns per block: None
+mm: org.apache.sysml.api.mlcontext.MatrixMetadata = rows: 10000, columns: 100, non-zeros: None, rows per block: None, columns per block: None
 
 scala> val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", "maxOut", "meanOut")
 minMaxMeanScript: org.apache.sysml.api.mlcontext.Script =
@@ -929,7 +929,7 @@ Symbol Table:
   [1] (Double) meanOut: 0.5000954668004209
   [2] (Double) maxOut: 0.9999999956646207
   [3] (Double) minOut: 1.4149740823476975E-7
-  [4] (Matrix) Xin: Matrix: scratch_space/temp_1166464711339222, [10000 x 1000, nnz=10000000, blocks (1000 x 1000)], binaryblock, not-dirty
+  [4] (Matrix) Xin: Matrix: scratch_space/temp_1166464711339222, [10000 x 100, nnz=1000000, blocks (1000 x 1000)], binaryblock, not-dirty
 
 Script String:
 
@@ -980,7 +980,7 @@ Symbol Table:
   [1] (Double) meanOut: 0.5000954668004209
   [2] (Double) maxOut: 0.9999999956646207
   [3] (Double) minOut: 1.4149740823476975E-7
-  [4] (Matrix) Xin: Matrix: scratch_space/temp_1166464711339222, [10000 x 1000, nnz=10000000, blocks (1000 x 1000)], binaryblock, not-dirty
+  [4] (Matrix) Xin: Matrix: scratch_space/temp_1166464711339222, [10000 x 100, nnz=1000000, blocks (1000 x 1000)], binaryblock, not-dirty
 
 scala> minMaxMeanScript.clearAll
 
@@ -1129,7 +1129,7 @@ meanOut = mean(Xin)
 "
 
 scala> val mm = new MatrixMetadata(numRows, numCols)
-mm: org.apache.sysml.api.mlcontext.MatrixMetadata = rows: 10000, columns: 1000, non-zeros: None, rows per block: None, columns per block: None
+mm: org.apache.sysml.api.mlcontext.MatrixMetadata = rows: 10000, columns: 100, non-zeros: None, rows per block: None, columns per block: None
 
 scala> val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", "maxOut", "meanOut")
 minMaxMeanScript: org.apache.sysml.api.mlcontext.Script =
@@ -1147,7 +1147,7 @@ scala> val (min, max, mean) = ml.execute(minMaxMeanScript).getTuple[Double, Doub
 PROGRAM
 --MAIN PROGRAM
 ----GENERIC (lines 1-8) [recompile=false]
-------(12) TRead Xin [10000,1000,1000,1000,10000000] [0,0,76 -> 76MB] [chkpt], CP
+------(12) TRead Xin [10000,100,1000,1000,1000000] [0,0,76 -> 76MB] [chkpt], CP
 ------(13) ua(minRC) (12) [0,0,-1,-1,-1] [76,0,0 -> 76MB], CP
 ------(21) TWrite minOut (13) [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
 ------(14) ua(maxRC) (12) [0,0,-1,-1,-1] [76,0,0 -> 76MB], CP
@@ -1523,7 +1523,7 @@ There are currently two mechanisms for this in SystemML: **(1) BinaryBlockMatrix
 If you have an input DataFrame, it can be converted to a BinaryBlockMatrix, and this BinaryBlockMatrix
 can be passed as an input rather than passing in the DataFrame as an input.
 
-For example, suppose we had a 10000x1000 matrix represented as a DataFrame, as we saw in an earlier example.
+For example, suppose we had a 10000x100 matrix represented as a DataFrame, as we saw in an earlier example.
 Now suppose we create two Script objects with the DataFrame as an input, as shown below. In the Spark Shell,
 when executing this code, you can see that each of the two Script object creations requires the
 time-consuming data conversion step.
@@ -1533,7 +1533,7 @@ import org.apache.spark.sql._
 import org.apache.spark.sql.types.{StructType,StructField,DoubleType}
 import scala.util.Random
 val numRows = 10000
-val numCols = 1000
+val numCols = 100
 val data = sc.parallelize(0 to numRows-1).map { _ => Row.fromSeq(Seq.fill(numCols)(Random.nextDouble)) }
 val schema = StructType((0 to numCols-1).map { i => StructField("C" + i, DoubleType, true) } )
 val df = spark.createDataFrame(data, schema)
@@ -1554,7 +1554,7 @@ import org.apache.spark.sql._
 import org.apache.spark.sql.types.{StructType,StructField,DoubleType}
 import scala.util.Random
 val numRows = 10000
-val numCols = 1000
+val numCols = 100
 val data = sc.parallelize(0 to numRows-1).map { _ => Row.fromSeq(Seq.fill(numCols)(Random.nextDouble)) }
 val schema = StructType((0 to numCols-1).map { i => StructField("C" + i, DoubleType, true) } )
 val df = spark.createDataFrame(data, schema)

[2/3] incubator-systemml git commit: [SYSTEMML-1259] Replace append with cbind for matrices

Posted by de...@apache.org.

[SYSTEMML-1259] Replace append with cbind for matrices

Replace matrix append calls with cbind calls.

Closes #391.


Project: http://git-wip-us.apache.org/repos/asf/incubator-systemml/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-systemml/commit/ba2819bc
Tree: http://git-wip-us.apache.org/repos/asf/incubator-systemml/tree/ba2819bc
Diff: http://git-wip-us.apache.org/repos/asf/incubator-systemml/diff/ba2819bc

Branch: refs/heads/gh-pages
Commit: ba2819bce02500a374c7e7fe957bb678efebf277
Parents: 0f92f40
Author: Deron Eriksson <de...@us.ibm.com>
Authored: Tue Feb 14 16:14:16 2017 -0800
Committer: Deron Eriksson <de...@us.ibm.com>
Committed: Tue Feb 14 16:14:16 2017 -0800

----------------------------------------------------------------------
 dml-language-reference.md | 1 -
 1 file changed, 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/ba2819bc/dml-language-reference.md
----------------------------------------------------------------------
diff --git a/dml-language-reference.md b/dml-language-reference.md
index 22ec0d9..fca2b9b 100644
--- a/dml-language-reference.md
+++ b/dml-language-reference.md
@@ -639,7 +639,6 @@ The builtin function `sum` operates on a matrix (say A of dimensionality (m x n)
 
 Function | Description | Parameters | Example
 -------- | ----------- | ---------- | -------
-append() | Adds the second argument as additional columns to the first argument (note that the first argument is not over-written). Append is meant to be used in situations where one cannot use left-indexing. <br/> **NOTE: append() has been replaced by cbind(), so its use is discouraged.** | Input: (X &lt;matrix&gt;, Y &lt;matrix&gt;) <br/>Output: &lt;matrix&gt; <br/> X and Y are matrices (with possibly multiple columns), where the number of rows in X and Y must be the same. Output is a matrix with exactly the same number of rows as X and Y. Let n1 and n2 denote the number of columns of matrix X and Y, respectively. The returned matrix has n1+n2 columns, where the first n1 columns contain X and the last n2 columns contain Y. | A = matrix(1, rows=2,cols=5) <br/> B = matrix(1, rows=2,cols=3) <br/> C = append(A,B) <br/> print("Dimensions of C: " + nrow(C) + " X " + ncol(C)) <br/> The output of above example is: <br/> Dimensions of C: 2 X 8
 cbind() | Column-wise matrix concatenation. Concatenates the second matrix as additional columns to the first matrix | Input: (X &lt;matrix&gt;, Y &lt;matrix&gt;) <br/>Output: &lt;matrix&gt; <br/> X and Y are matrices, where the number of rows in X and the number of rows in Y are the same. | A = matrix(1, rows=2,cols=3) <br/> B = matrix(2, rows=2,cols=3) <br/> C = cbind(A,B) <br/> print("Dimensions of C: " + nrow(C) + " X " + ncol(C)) <br/> Output: <br/> Dimensions of C: 2 X 6
 matrix() | Matrix constructor (assigning all the cells to numeric literals). | Input: (&lt;init&gt;, rows=&lt;value&gt;, cols=&lt;value&gt;) <br/> init: numeric literal; <br/> rows/cols: number of rows/cols (expression) <br/> Output: matrix | # 10x10 matrix initialized to 0 <br/> A = matrix (0, rows=10, cols=10)
  | Matrix constructor (reshaping an existing matrix). | Input: (&lt;existing matrix&gt;, rows=&lt;value&gt;, cols=&lt;value&gt;, byrow=TRUE) <br/> Output: matrix | A = matrix (0, rows=10, cols=10) <br/> B = matrix (A, rows=100, cols=1)