You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@systemml.apache.org by de...@apache.org on 2016/01/12 22:59:31 UTC

[1/2] incubator-systemml git commit: Improve structure of doc presentation

Repository: incubator-systemml
Updated Branches:
  refs/heads/master 52fae50d5 -> 65844aa6d


http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/65844aa6/docs/mlcontext-programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/mlcontext-programming-guide.md b/docs/mlcontext-programming-guide.md
deleted file mode 100644
index 73488d7..0000000
--- a/docs/mlcontext-programming-guide.md
+++ /dev/null
@@ -1,996 +0,0 @@
----
-layout: global
-title: MLContext Programming Guide
-description: MLContext Programming Guide
----
-<!--
-{% comment %}
-Licensed to the Apache Software Foundation (ASF) under one or more
-contributor license agreements.  See the NOTICE file distributed with
-this work for additional information regarding copyright ownership.
-The ASF licenses this file to you under the Apache License, Version 2.0
-(the "License"); you may not use this file except in compliance with
-the License.  You may obtain a copy of the License at
-
-http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
-{% endcomment %}
--->
-
-* This will become a table of contents (this text will be scraped).
-{:toc}
-
-<br/>
-
-
-# Overview
-
-The `MLContext` API offers a programmatic interface for interacting with SystemML from languages
-such as Scala and Java. When interacting with `MLContext` from Spark, `DataFrame`s and `RDD`s can be passed
-to SystemML. These data representations are converted to a
-binary-block data format, allowing for SystemML's optimizations to be performed.
-
-
-# Spark Shell Example
-
-## Start Spark Shell with SystemML
-
-To use SystemML with the Spark Shell, the SystemML jar can be referenced using the Spark Shell's `--jars` option. 
-Instructions to build the SystemML jar can be found in the [SystemML GitHub README](https://github.com/apache/incubator-systemml).
-
-{% highlight bash %}
-./bin/spark-shell --executor-memory 4G --driver-memory 4G --jars SystemML.jar
-{% endhighlight %}
-
-Here is an example of Spark Shell with SystemML and YARN.
-
-{% highlight bash %}
-./bin/spark-shell --master yarn-client --num-executors 3 --driver-memory 5G --executor-memory 5G --executor-cores 4 --jars SystemML.jar
-{% endhighlight %}
-
-
-## Create MLContext
-
-An `MLContext` object can be created by passing its constructor a reference to the `SparkContext`.
-
-<div class="codetabs">
-
-<div data-lang="Spark Shell" markdown="1">
-{% highlight scala %}
-scala>import org.apache.sysml.api.MLContext
-import org.apache.sysml.api.MLContext
-
-scala> val ml = new MLContext(sc)
-ml: org.apache.sysml.api.MLContext = org.apache.sysml.api.MLContext@33e38c6b
-{% endhighlight %}
-</div>
-
-<div data-lang="Statements" markdown="1">
-{% highlight scala %}
-import org.apache.sysml.api.MLContext
-val ml = new MLContext(sc)
-{% endhighlight %}
-</div>
-
-</div>
-
-
-## Create DataFrame
-
-For demonstration purposes, we'll create a `DataFrame` consisting of 100,000 rows and 1,000 columns
-of random `double`s.
-
-<div class="codetabs">
-
-<div data-lang="Spark Shell" markdown="1">
-{% highlight scala %}
-scala> import org.apache.spark.sql._
-import org.apache.spark.sql._
-
-scala> import org.apache.spark.sql.types.{StructType,StructField,DoubleType}
-import org.apache.spark.sql.types.{StructType, StructField, DoubleType}
-
-scala> import scala.util.Random
-import scala.util.Random
-
-scala> val numRows = 100000
-numRows: Int = 100000
-
-scala> val numCols = 1000
-numCols: Int = 1000
-
-scala> val data = sc.parallelize(0 to numRows-1).map { _ => Row.fromSeq(Seq.fill(numCols)(Random.nextDouble)) }
-data: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = MapPartitionsRDD[1] at map at <console>:33
-
-scala> val schema = StructType((0 to numCols-1).map { i => StructField("C" + i, DoubleType, true) } )
-schema: org.apache.spark.sql.types.StructType = StructType(StructField(C0,DoubleType,true), StructField(C1,DoubleType,true), StructField(C2,DoubleType,true), StructField(C3,DoubleType,true), StructField(C4,DoubleType,true), StructField(C5,DoubleType,true), StructField(C6,DoubleType,true), StructField(C7,DoubleType,true), StructField(C8,DoubleType,true), StructField(C9,DoubleType,true), StructField(C10,DoubleType,true), StructField(C11,DoubleType,true), StructField(C12,DoubleType,true), StructField(C13,DoubleType,true), StructField(C14,DoubleType,true), StructField(C15,DoubleType,true), StructField(C16,DoubleType,true), StructField(C17,DoubleType,true), StructField(C18,DoubleType,true), StructField(C19,DoubleType,true), StructField(C20,DoubleType,true), StructField(C21,DoubleType,true), ...
-
-scala> val df = sqlContext.createDataFrame(data, schema)
-df: org.apache.spark.sql.DataFrame = [C0: double, C1: double, C2: double, C3: double, C4: double, C5: double, C6: double, C7: double, C8: double, C9: double, C10: double, C11: double, C12: double, C13: double, C14: double, C15: double, C16: double, C17: double, C18: double, C19: double, C20: double, C21: double, C22: double, C23: double, C24: double, C25: double, C26: double, C27: double, C28: double, C29: double, C30: double, C31: double, C32: double, C33: double, C34: double, C35: double, C36: double, C37: double, C38: double, C39: double, C40: double, C41: double, C42: double, C43: double, C44: double, C45: double, C46: double, C47: double, C48: double, C49: double, C50: double, C51: double, C52: double, C53: double, C54: double, C55: double, C56: double, C57: double, C58: double, C5...
-
-{% endhighlight %}
-</div>
-
-<div data-lang="Statements" markdown="1">
-{% highlight scala %}
-import org.apache.spark.sql._
-import org.apache.spark.sql.types.{StructType,StructField,DoubleType}
-import scala.util.Random
-val numRows = 100000
-val numCols = 1000
-val data = sc.parallelize(0 to numRows-1).map { _ => Row.fromSeq(Seq.fill(numCols)(Random.nextDouble)) }
-val schema = StructType((0 to numCols-1).map { i => StructField("C" + i, DoubleType, true) } )
-val df = sqlContext.createDataFrame(data, schema)
-{% endhighlight %}
-</div>
-
-</div>
-
-
-## Helper Methods
-
-For convenience, we'll create some helper methods. The SystemML output data is encapsulated in
-an `MLOutput` object. The `getScalar()` method extracts a scalar value from a `DataFrame` returned by
-`MLOutput`. The `getScalarDouble()` method returns such a value as a `Double`, and the
-`getScalarInt()` method returns such a value as an `Int`.
-
-<div class="codetabs">
-
-<div data-lang="Spark Shell" markdown="1">
-{% highlight scala %}
-scala> import org.apache.sysml.api.MLOutput
-import org.apache.sysml.api.MLOutput
-
-scala> def getScalar(outputs: MLOutput, symbol: String): Any =
-     | outputs.getDF(sqlContext, symbol).first()(1)
-getScalar: (outputs: org.apache.sysml.api.MLOutput, symbol: String)Any
-
-scala> def getScalarDouble(outputs: MLOutput, symbol: String): Double =
-     | getScalar(outputs, symbol).asInstanceOf[Double]
-getScalarDouble: (outputs: org.apache.sysml.api.MLOutput, symbol: String)Double
-
-scala> def getScalarInt(outputs: MLOutput, symbol: String): Int =
-     | getScalarDouble(outputs, symbol).toInt
-getScalarInt: (outputs: org.apache.sysml.api.MLOutput, symbol: String)Int
-
-{% endhighlight %}
-</div>
-
-<div data-lang="Statements" markdown="1">
-{% highlight scala %}
-import org.apache.sysml.api.MLOutput
-def getScalar(outputs: MLOutput, symbol: String): Any =
-outputs.getDF(sqlContext, symbol).first()(1)
-def getScalarDouble(outputs: MLOutput, symbol: String): Double =
-getScalar(outputs, symbol).asInstanceOf[Double]
-def getScalarInt(outputs: MLOutput, symbol: String): Int =
-getScalarDouble(outputs, symbol).toInt
-
-{% endhighlight %}
-</div>
-
-</div>
-
-
-## Convert DataFrame to Binary-Block Matrix
-
-SystemML is optimized to operate on a binary-block format for matrix representation. For large
-datasets, conversion from DataFrame to binary-block can require a significant quantity of time.
-Explicit DataFrame to binary-block conversion allows algorithm performance to be measured separately
-from data conversion time.
-
-The SystemML binary-block matrix representation can be thought of as a two-dimensional array of blocks, where each block
-consists of a number of rows and columns. In this example, we specify a matrix consisting
-of blocks of size 1000x1000. The experimental `dataFrameToBinaryBlock()` method of `RDDConverterUtilsExt` is used
-to convert the `DataFrame df` to a SystemML binary-block matrix, which is represented by the datatype
-`JavaPairRDD[MatrixIndexes, MatrixBlock]`.
-
-<div class="codetabs">
-
-<div data-lang="Spark Shell" markdown="1">
-{% highlight scala %}
-scala> import org.apache.sysml.runtime.instructions.spark.utils.{RDDConverterUtilsExt => RDDConverterUtils}
-import org.apache.sysml.runtime.instructions.spark.utils.{RDDConverterUtilsExt=>RDDConverterUtils}
-
-scala> import org.apache.sysml.runtime.matrix.MatrixCharacteristics;
-import org.apache.sysml.runtime.matrix.MatrixCharacteristics
-
-scala> val numRowsPerBlock = 1000
-numRowsPerBlock: Int = 1000
-
-scala> val numColsPerBlock = 1000
-numColsPerBlock: Int = 1000
-
-scala> val mc = new MatrixCharacteristics(numRows, numCols, numRowsPerBlock, numColsPerBlock)
-mc: org.apache.sysml.runtime.matrix.MatrixCharacteristics = [100000 x 1000, nnz=-1, blocks (1000 x 1000)]
-
-scala> val sysMlMatrix = RDDConverterUtils.dataFrameToBinaryBlock(sc, df, mc, false)
-sysMlMatrix: org.apache.spark.api.java.JavaPairRDD[org.apache.sysml.runtime.matrix.data.MatrixIndexes,org.apache.sysml.runtime.matrix.data.MatrixBlock] = org.apache.spark.api.java.JavaPairRDD@2bce3248
-
-{% endhighlight %}
-</div>
-
-<div data-lang="Statements" markdown="1">
-{% highlight scala %}
-import org.apache.sysml.runtime.instructions.spark.utils.{RDDConverterUtilsExt => RDDConverterUtils}
-import org.apache.sysml.runtime.matrix.MatrixCharacteristics;
-val numRowsPerBlock = 1000
-val numColsPerBlock = 1000
-val mc = new MatrixCharacteristics(numRows, numCols, numRowsPerBlock, numColsPerBlock)
-val sysMlMatrix = RDDConverterUtils.dataFrameToBinaryBlock(sc, df, mc, false)
-
-{% endhighlight %}
-</div>
-
-</div>
-
-
-## DML Script
-
-For this example, we will utilize the following DML Script called `shape.dml` that reads in a matrix and outputs the number of rows and the
-number of columns, each represented as a matrix.
-
-{% highlight r %}
-X = read($Xin)
-m = matrix(nrow(X), rows=1, cols=1)
-n = matrix(ncol(X), rows=1, cols=1)
-write(m, $Mout)
-write(n, $Nout)
-{% endhighlight %}
-
-
-## Execute Script
-
-Let's execute our DML script, as shown in the example below. The call to `reset()` of `MLContext` is not necessary here, but this method should
-be called if you need to reset inputs and outputs or if you would like to call `execute()` with a different script.
-
-An example of registering the `DataFrame df` as an input to the `X` variable is shown but commented out. If a DataFrame is registered directly,
-it will implicitly be converted to SystemML's binary-block format. However, since we've already explicitly converted the DataFrame to the
-binary-block fixed variable `systemMlMatrix`, we will register this input to the `X` variable. We register the `m` and `n` variables
-as outputs.
-
-When SystemML is executed via `DMLScript` (such as in Standalone Mode), inputs are supplied as either command-line named arguments
-or positional argument. These inputs are specified in DML scripts by prepending them with a `$`. Values are read from or written
-to files using `read`/`write` (DML) and `load`/`save` (PyDML) statements. When utilizing the `MLContext` API,
-inputs and outputs can be other data representations, such as `DataFrame`s. The input and output data are bound to DML variables.
-The named arguments in the `shape.dml` script do not have default values set for them, so we create a `Map` to map the required named
-arguments to blank `String`s so that the script can pass validation.
-
-The `shape.dml` script is executed by the call to `execute()`, where we supply the `Map` of required named arguments. The
-execution results are returned as the `MLOutput` fixed variable `outputs`. The number of rows is obtained by calling the `getStaticInt()`
-helper method with the `outputs` object and `"m"`. The number of columns is retrieved by calling `getStaticInt()` with 
-`outputs` and `"n"`.
-
-<div class="codetabs">
-
-<div data-lang="Spark Shell" markdown="1">
-{% highlight scala %}
-scala> ml.reset()
-
-scala> //ml.registerInput("X", df) // implicit conversion of DataFrame to binary-block
-
-scala> ml.registerInput("X", sysMlMatrix, numRows, numCols)
-
-scala> ml.registerOutput("m")
-
-scala> ml.registerOutput("n")
-
-scala> val nargs = Map("Xin" -> " ", "Mout" -> " ", "Nout" -> " ")
-nargs: scala.collection.immutable.Map[String,String] = Map(Xin -> " ", Mout -> " ", Nout -> " ")
-
-scala> val outputs = ml.execute("shape.dml", nargs)
-15/10/12 16:29:15 WARN : Your hostname, derons-mbp.usca.ibm.com resolves to a loopback/non-reachable address: 127.0.0.1, but we couldn't find any external IP address!
-15/10/12 16:29:15 WARN OptimizerUtils: Auto-disable multi-threaded text read for 'text' and 'csv' due to thread contention on JRE < 1.8 (java.version=1.7.0_80).
-outputs: org.apache.sysml.api.MLOutput = org.apache.sysml.api.MLOutput@4d424743
-
-scala> val m = getScalarInt(outputs, "m")
-m: Int = 100000
-
-scala> val n = getScalarInt(outputs, "n")
-n: Int = 1000
-
-{% endhighlight %}
-</div>
-
-<div data-lang="Statements" markdown="1">
-{% highlight scala %}
-ml.reset()
-//ml.registerInput("X", df) // implicit conversion of DataFrame to binary-block
-ml.registerInput("X", sysMlMatrix, numRows, numCols)
-ml.registerOutput("m")
-ml.registerOutput("n")
-val nargs = Map("Xin" -> " ", "Mout" -> " ", "Nout" -> " ")
-val outputs = ml.execute("shape.dml", nargs)
-val m = getScalarInt(outputs, "m")
-val n = getScalarInt(outputs, "n")
-
-{% endhighlight %}
-</div>
-
-</div>
-
-
-## DML Script as String
-
-The `MLContext` API allows a DML script to be specified
-as a `String`. Here, we specify a DML script as a fixed `String` variable called `minMaxMeanScript`.
-This DML will find the minimum, maximum, and mean value of a matrix.
-
-<div class="codetabs">
-
-<div data-lang="Spark Shell" markdown="1">
-{% highlight scala %}
-scala> val minMaxMeanScript: String = 
-     | """
-     | Xin = read(" ")
-     | minOut = matrix(min(Xin), rows=1, cols=1)
-     | maxOut = matrix(max(Xin), rows=1, cols=1)
-     | meanOut = matrix(mean(Xin), rows=1, cols=1)
-     | write(minOut, " ")
-     | write(maxOut, " ")
-     | write(meanOut, " ")
-     | """
-minMaxMeanScript: String = 
-"
-Xin = read(" ")
-minOut = matrix(min(Xin), rows=1, cols=1)
-maxOut = matrix(max(Xin), rows=1, cols=1)
-meanOut = matrix(mean(Xin), rows=1, cols=1)
-write(minOut, " ")
-write(maxOut, " ")
-write(meanOut, " ")
-"
-
-{% endhighlight %}
-</div>
-
-<div data-lang="Statements" markdown="1">
-{% highlight scala %}
-val minMaxMeanScript: String = 
-"""
-Xin = read(" ")
-minOut = matrix(min(Xin), rows=1, cols=1)
-maxOut = matrix(max(Xin), rows=1, cols=1)
-meanOut = matrix(mean(Xin), rows=1, cols=1)
-write(minOut, " ")
-write(maxOut, " ")
-write(meanOut, " ")
-"""
-
-{% endhighlight %}
-</div>
-
-</div>
-
-## Scala Wrapper for DML
-
-We can create a Scala wrapper for our invocation of the `minMaxMeanScript` DML `String`. The `minMaxMean()` method
-takes a `JavaPairRDD[MatrixIndexes, MatrixBlock]` parameter, which is a SystemML binary-block matrix representation.
-It also takes a `rows` parameter indicating the number of rows in the matrix, a `cols` parameter indicating the number
-of columns in the matrix, and an `MLContext` parameter. The `minMaxMean()` method
-returns a tuple consisting of the minimum value in the matrix, the maximum value in the matrix, and the computed
-mean value of the matrix.
-
-<div class="codetabs">
-
-<div data-lang="Spark Shell" markdown="1">
-{% highlight scala %}
-scala> import org.apache.sysml.runtime.matrix.data.MatrixIndexes
-import org.apache.sysml.runtime.matrix.data.MatrixIndexes
-
-scala> import org.apache.sysml.runtime.matrix.data.MatrixBlock
-import org.apache.sysml.runtime.matrix.data.MatrixBlock
-
-scala> import org.apache.spark.api.java.JavaPairRDD
-import org.apache.spark.api.java.JavaPairRDD
-
-scala> def minMaxMean(mat: JavaPairRDD[MatrixIndexes, MatrixBlock], rows: Int, cols: Int, ml: MLContext): (Double, Double, Double) = {
-     | ml.reset()
-     | ml.registerInput("Xin", mat, rows, cols)
-     | ml.registerOutput("minOut")
-     | ml.registerOutput("maxOut")
-     | ml.registerOutput("meanOut")
-     | val outputs = ml.executeScript(minMaxMeanScript)
-     | val minOut = getScalarDouble(outputs, "minOut")
-     | val maxOut = getScalarDouble(outputs, "maxOut")
-     | val meanOut = getScalarDouble(outputs, "meanOut")
-     | (minOut, maxOut, meanOut)
-     | }
-minMaxMean: (mat: org.apache.spark.api.java.JavaPairRDD[org.apache.sysml.runtime.matrix.data.MatrixIndexes,org.apache.sysml.runtime.matrix.data.MatrixBlock], rows: Int, cols: Int, ml: org.apache.sysml.api.MLContext)(Double, Double, Double)
-
-{% endhighlight %}
-</div>
-
-<div data-lang="Statements" markdown="1">
-{% highlight scala %}
-import org.apache.sysml.runtime.matrix.data.MatrixIndexes
-import org.apache.sysml.runtime.matrix.data.MatrixBlock
-import org.apache.spark.api.java.JavaPairRDD
-def minMaxMean(mat: JavaPairRDD[MatrixIndexes, MatrixBlock], rows: Int, cols: Int, ml: MLContext): (Double, Double, Double) = {
-ml.reset()
-ml.registerInput("Xin", mat, rows, cols)
-ml.registerOutput("minOut")
-ml.registerOutput("maxOut")
-ml.registerOutput("meanOut")
-val outputs = ml.executeScript(minMaxMeanScript)
-val minOut = getScalarDouble(outputs, "minOut")
-val maxOut = getScalarDouble(outputs, "maxOut")
-val meanOut = getScalarDouble(outputs, "meanOut")
-(minOut, maxOut, meanOut)
-}
-
-{% endhighlight %}
-</div>
-
-</div>
-
-
-## Invoking DML via Scala Wrapper
-
-Here, we invoke `minMaxMeanScript` using our `minMaxMean()` Scala wrapper method. It returns a tuple
-consisting of the minimum value in the matrix, the maximum value in the matrix, and the mean value of the matrix.
-
-<div class="codetabs">
-
-<div data-lang="Spark Shell" markdown="1">
-{% highlight scala %}
-scala> val (min, max, mean) = minMaxMean(sysMlMatrix, numRows, numCols, ml)
-15/10/13 14:33:11 WARN OptimizerUtils: Auto-disable multi-threaded text read for 'text' and 'csv' due to thread contention on JRE < 1.8 (java.version=1.7.0_80).
-min: Double = 5.378949397005783E-9                                              
-max: Double = 0.9999999934660398
-mean: Double = 0.499988222338507
-
-{% endhighlight %}
-</div>
-
-<div data-lang="Statements" markdown="1">
-{% highlight scala %}
-val (min, max, mean) = minMaxMean(sysMlMatrix, numRows, numCols, ml)
-
-{% endhighlight %}
-</div>
-
-</div>
-
-
-* * *
-
-# Java Example
-
-Next, let's consider a Java example. The `MLContextExample` class creates an `MLContext` object from a `JavaSparkContext`.
-Next, it reads in a matrix CSV file as a `JavaRDD<String>` object. It registers this as input `X`. It registers
-two outputs, `m` and `n`. A `HashMap` maps the expected command-line arguments of the `shape.dml` script to spaces so that
-it passes validation. The `shape.dml` script is executed, and the number of rows and columns in the matrix are output
-to standard output.
-
-
-{% highlight java %}
-package org.apache.sysml;
-
-import java.util.HashMap;
-
-import org.apache.spark.SparkConf;
-import org.apache.spark.api.java.JavaRDD;
-import org.apache.spark.api.java.JavaSparkContext;
-import org.apache.spark.sql.DataFrame;
-import org.apache.spark.sql.SQLContext;
-
-import org.apache.sysml.api.MLContext;
-import org.apache.sysml.api.MLOutput;
-
-public class MLContextExample {
-
-	public static void main(String[] args) throws Exception {
-
-		SparkConf conf = new SparkConf().setAppName("MLContextExample").setMaster("local");
-		JavaSparkContext sc = new JavaSparkContext(conf);
-		SQLContext sqlContext = new SQLContext(sc);
-		MLContext ml = new MLContext(sc);
-
-		JavaRDD<String> csv = sc.textFile("A.csv");
-		ml.registerInput("X", csv, "csv");
-		ml.registerOutput("m");
-		ml.registerOutput("n");
-		HashMap<String, String> cmdLineArgs = new HashMap<String, String>();
-		cmdLineArgs.put("X", " ");
-		cmdLineArgs.put("m", " ");
-		cmdLineArgs.put("n", " ");
-		MLOutput output = ml.execute("shape.dml", cmdLineArgs);
-		DataFrame mDf = output.getDF(sqlContext, "m");
-		DataFrame nDf = output.getDF(sqlContext, "n");
-		System.out.println("rows:" + mDf.first().getDouble(1));
-		System.out.println("cols:" + nDf.first().getDouble(1));
-	}
-
-}
-
-
-{% endhighlight %}
-
-
-* * *
-
-# Zeppelin Notebook Example - Linear Regression Algorithm
-
-Next, we'll consider an example of a SystemML linear regression algorithm run from Spark through an Apache Zeppelin notebook.
-Instructions to clone and build Zeppelin can be found at the [GitHub Apache Zeppelin](https://github.com/apache/incubator-zeppelin)
-site. This example also will look at the Spark ML linear regression algorithm.
-
-This Zeppelin notebook example can be downloaded [here](files/mlcontext-programming-guide/zeppelin-notebook-linear-regression/2AZ2AQ12B.tar.gz).
-Once downloaded and unzipped, place the folder in the Zeppelin `notebook` directory.
-
-A `conf/zeppelin-env.sh` file is created based on `conf/zeppelin-env.sh.template`. For
-this demonstration, it features `SPARK_HOME`, `SPARK_SUBMIT_OPTIONS`, and `ZEPPELIN_SPARK_USEHIVECONTEXT`
-environment variables:
-
-	export SPARK_HOME=/Users/example/spark-1.5.1-bin-hadoop2.6
-	export SPARK_SUBMIT_OPTIONS="--jars $/Users/example/systemml/system-ml/target/SystemML.jar"
-	export ZEPPELIN_SPARK_USEHIVECONTEXT=false
-
-Start Zeppelin using the `zeppelin.sh` script:
-
-	bin/zeppelin.sh
-
-After opening Zeppelin in a brower, we see the "SystemML - Linear Regression" note in the list of available
-Zeppelin notes.
-
-![Zeppelin Notebook](img/mlcontext-programming-guide/zeppelin-notebook.png "Zeppelin Notebook")
-
-If we go to the "SystemML - Linear Regression" note, we see that the note consists of several cells of code.
-
-![Zeppelin 'SystemML - Linear Regression' Note](img/mlcontext-programming-guide/zeppelin-notebook-systemml-linear-regression.png "Zeppelin 'SystemML - Linear Regression' Note")
-
-Let's briefly consider these cells.
-
-## Trigger Spark Startup
-
-This cell triggers Spark to initialize by calling the `SparkContext` `sc` object. Information regarding these startup operations can be viewed in the 
-console window in which `zeppelin.sh` is running.
-
-**Cell:**
-{% highlight scala %}
-// Trigger Spark Startup
-sc
-{% endhighlight %}
-
-**Output:**
-{% highlight scala %}
-res8: org.apache.spark.SparkContext = org.apache.spark.SparkContext@6ce70bf3
-{% endhighlight %}
-
-
-## Generate Linear Regression Test Data
-
-The Spark `LinearDataGenerator` is used to generate test data for the Spark ML and SystemML linear regression algorithms.
-
-**Cell:**
-{% highlight scala %}
-// Generate data
-import org.apache.spark.mllib.util.LinearDataGenerator
-
-val numRows = 10000
-val numCols = 1000
-val rawData = LinearDataGenerator.generateLinearRDD(sc, numRows, numCols, 1).toDF()
-
-// Repartition into a more parallelism-friendly number of partitions
-val data = rawData.repartition(64).cache()
-{% endhighlight %}
-
-**Output:**
-{% highlight scala %}
-import org.apache.spark.mllib.util.LinearDataGenerator
-numRows: Int = 10000
-numCols: Int = 1000
-rawData: org.apache.spark.sql.DataFrame = [label: double, features: vector]
-data: org.apache.spark.sql.DataFrame = [label: double, features: vector]
-{% endhighlight %}
-
-
-## Train using Spark ML Linear Regression Algorithm for Comparison
-
-For purpose of comparison, we can train a model using the Spark ML linear regression
-algorithm.
-
-**Cell:**
-{% highlight scala %}
-// Spark ML
-import org.apache.spark.ml.regression.LinearRegression
-
-// Model Settings
-val maxIters = 100
-val reg = 0
-val elasticNetParam = 0  // L2 reg
-
-// Fit the model
-val lr = new LinearRegression()
-  .setMaxIter(maxIters)
-  .setRegParam(reg)
-  .setElasticNetParam(elasticNetParam)
-val start = System.currentTimeMillis()
-val model = lr.fit(data)
-val trainingTime = (System.currentTimeMillis() - start).toDouble / 1000.0
-
-// Summarize the model over the training set and gather some metrics
-val trainingSummary = model.summary
-val r2 = trainingSummary.r2
-val iters = trainingSummary.totalIterations
-val trainingTimePerIter = trainingTime / iters
-{% endhighlight %}
-
-**Output:**
-{% highlight scala %}
-import org.apache.spark.ml.regression.LinearRegression
-maxIters: Int = 100
-reg: Int = 0
-elasticNetParam: Int = 0
-lr: org.apache.spark.ml.regression.LinearRegression = linReg_a7f51d676562
-start: Long = 1444672044647
-model: org.apache.spark.ml.regression.LinearRegressionModel = linReg_a7f51d676562
-trainingTime: Double = 12.985
-trainingSummary: org.apache.spark.ml.regression.LinearRegressionTrainingSummary = org.apache.spark.ml.regression.LinearRegressionTrainingSummary@227ba28b
-r2: Double = 0.9677118209276552
-iters: Int = 17
-trainingTimePerIter: Double = 0.7638235294117647
-{% endhighlight %}
-
-
-## Spark ML Linear Regression Summary Statistics
-
-Summary statistics for the Spark ML linear regression algorithm are displayed by this cell.
-
-**Cell:**
-{% highlight scala %}
-// Print statistics
-println(s"R2: ${r2}")
-println(s"Iterations: ${iters}")
-println(s"Training time per iter: ${trainingTimePerIter} seconds")
-{% endhighlight %}
-
-**Output:**
-{% highlight scala %}
-R2: 0.9677118209276552
-Iterations: 17
-Training time per iter: 0.7638235294117647 seconds
-{% endhighlight %}
-
-
-## SystemML Linear Regression Algorithm
-
-The `linearReg` fixed `String` variable is set to
-a linear regression algorithm written in DML, SystemML's Declarative Machine Learning language.
-
-
-
-**Cell:**
-{% highlight scala %}
-// SystemML kernels
-val linearReg =
-"""
-#
-# THIS SCRIPT SOLVES LINEAR REGRESSION USING THE CONJUGATE GRADIENT ALGORITHM
-#
-# INPUT PARAMETERS:
-# --------------------------------------------------------------------------------------------
-# NAME  TYPE   DEFAULT  MEANING
-# --------------------------------------------------------------------------------------------
-# X     String  ---     Matrix X of feature vectors
-# Y     String  ---     1-column Matrix Y of response values
-# icpt  Int      0      Intercept presence, shifting and rescaling the columns of X:
-#                       0 = no intercept, no shifting, no rescaling;
-#                       1 = add intercept, but neither shift nor rescale X;
-#                       2 = add intercept, shift & rescale X columns to mean = 0, variance = 1
-# reg   Double 0.000001 Regularization constant (lambda) for L2-regularization; set to nonzero
-#                       for highly dependend/sparse/numerous features
-# tol   Double 0.000001 Tolerance (epsilon); conjugate graduent procedure terminates early if
-#                       L2 norm of the beta-residual is less than tolerance * its initial norm
-# maxi  Int      0      Maximum number of conjugate gradient iterations, 0 = no maximum
-# --------------------------------------------------------------------------------------------
-#
-# OUTPUT:
-# B Estimated regression parameters (the betas) to store
-#
-# Note: Matrix of regression parameters (the betas) and its size depend on icpt input value:
-#         OUTPUT SIZE:   OUTPUT CONTENTS:                HOW TO PREDICT Y FROM X AND B:
-# icpt=0: ncol(X)   x 1  Betas for X only                Y ~ X %*% B[1:ncol(X), 1], or just X %*% B
-# icpt=1: ncol(X)+1 x 1  Betas for X and intercept       Y ~ X %*% B[1:ncol(X), 1] + B[ncol(X)+1, 1]
-# icpt=2: ncol(X)+1 x 2  Col.1: betas for X & intercept  Y ~ X %*% B[1:ncol(X), 1] + B[ncol(X)+1, 1]
-#                        Col.2: betas for shifted/rescaled X and intercept
-#
-
-fileX = "";
-fileY = "";
-fileB = "";
-
-intercept_status = ifdef ($icpt, 0);     # $icpt=0;
-tolerance = ifdef ($tol, 0.000001);      # $tol=0.000001;
-max_iteration = ifdef ($maxi, 0);        # $maxi=0;
-regularization = ifdef ($reg, 0.000001); # $reg=0.000001;
-
-X = read (fileX);
-y = read (fileY);
-
-n = nrow (X);
-m = ncol (X);
-ones_n = matrix (1, rows = n, cols = 1);
-zero_cell = matrix (0, rows = 1, cols = 1);
-
-# Introduce the intercept, shift and rescale the columns of X if needed
-
-m_ext = m;
-if (intercept_status == 1 | intercept_status == 2)  # add the intercept column
-{
-    X = append (X, ones_n);
-    m_ext = ncol (X);
-}
-
-scale_lambda = matrix (1, rows = m_ext, cols = 1);
-if (intercept_status == 1 | intercept_status == 2)
-{
-    scale_lambda [m_ext, 1] = 0;
-}
-
-if (intercept_status == 2)  # scale-&-shift X columns to mean 0, variance 1
-{                           # Important assumption: X [, m_ext] = ones_n
-    avg_X_cols = t(colSums(X)) / n;
-    var_X_cols = (t(colSums (X ^ 2)) - n * (avg_X_cols ^ 2)) / (n - 1);
-    is_unsafe = ppred (var_X_cols, 0.0, "<=");
-    scale_X = 1.0 / sqrt (var_X_cols * (1 - is_unsafe) + is_unsafe);
-    scale_X [m_ext, 1] = 1;
-    shift_X = - avg_X_cols * scale_X;
-    shift_X [m_ext, 1] = 0;
-} else {
-    scale_X = matrix (1, rows = m_ext, cols = 1);
-    shift_X = matrix (0, rows = m_ext, cols = 1);
-}
-
-# Henceforth, if intercept_status == 2, we use "X %*% (SHIFT/SCALE TRANSFORM)"
-# instead of "X".  However, in order to preserve the sparsity of X,
-# we apply the transform associatively to some other part of the expression
-# in which it occurs.  To avoid materializing a large matrix, we rewrite it:
-#
-# ssX_A  = (SHIFT/SCALE TRANSFORM) %*% A    --- is rewritten as:
-# ssX_A  = diag (scale_X) %*% A;
-# ssX_A [m_ext, ] = ssX_A [m_ext, ] + t(shift_X) %*% A;
-#
-# tssX_A = t(SHIFT/SCALE TRANSFORM) %*% A   --- is rewritten as:
-# tssX_A = diag (scale_X) %*% A + shift_X %*% A [m_ext, ];
-
-lambda = scale_lambda * regularization;
-beta_unscaled = matrix (0, rows = m_ext, cols = 1);
-
-if (max_iteration == 0) {
-    max_iteration = m_ext;
-}
-i = 0;
-
-# BEGIN THE CONJUGATE GRADIENT ALGORITHM
-r = - t(X) %*% y;
-
-if (intercept_status == 2) {
-    r = scale_X * r + shift_X %*% r [m_ext, ];
-}
-
-p = - r;
-norm_r2 = sum (r ^ 2);
-norm_r2_initial = norm_r2;
-norm_r2_target = norm_r2_initial * tolerance ^ 2;
-
-while (i < max_iteration & norm_r2 > norm_r2_target)
-{
-    if (intercept_status == 2) {
-        ssX_p = scale_X * p;
-        ssX_p [m_ext, ] = ssX_p [m_ext, ] + t(shift_X) %*% p;
-    } else {
-        ssX_p = p;
-    }
-
-    q = t(X) %*% (X %*% ssX_p);
-
-    if (intercept_status == 2) {
-        q = scale_X * q + shift_X %*% q [m_ext, ];
-    }
-
-    q = q + lambda * p;
-    a = norm_r2 / sum (p * q);
-    beta_unscaled = beta_unscaled + a * p;
-    r = r + a * q;
-    old_norm_r2 = norm_r2;
-    norm_r2 = sum (r ^ 2);
-    p = -r + (norm_r2 / old_norm_r2) * p;
-    i = i + 1;
-}
-# END THE CONJUGATE GRADIENT ALGORITHM
-
-if (intercept_status == 2) {
-    beta = scale_X * beta_unscaled;
-    beta [m_ext, ] = beta [m_ext, ] + t(shift_X) %*% beta_unscaled;
-} else {
-    beta = beta_unscaled;
-}
-
-# Output statistics
-avg_tot = sum (y) / n;
-ss_tot = sum (y ^ 2);
-ss_avg_tot = ss_tot - n * avg_tot ^ 2;
-var_tot = ss_avg_tot / (n - 1);
-y_residual = y - X %*% beta;
-avg_res = sum (y_residual) / n;
-ss_res = sum (y_residual ^ 2);
-ss_avg_res = ss_res - n * avg_res ^ 2;
-
-R2_temp = 1 - ss_res / ss_avg_tot
-R2 = matrix(R2_temp, rows=1, cols=1)
-write(R2, "")
-
-totalIters = matrix(i, rows=1, cols=1)
-write(totalIters, "")
-
-# Prepare the output matrix
-if (intercept_status == 2) {
-    beta_out = append (beta, beta_unscaled);
-} else {
-    beta_out = beta;
-}
-
-write (beta_out, fileB);
-"""
-{% endhighlight %}
-
-**Output:**
-
-None
-
-
-## Helper Methods
-
-This cell contains helper methods to return `Double` and `Int` values from output generated by the `MLContext` API.
-
-**Cell:**
-{% highlight scala %}
-// Helper functions
-import org.apache.sysml.api.MLOutput
-
-def getScalar(outputs: MLOutput, symbol: String): Any =
-    outputs.getDF(sqlContext, symbol).first()(1)
-    
-def getScalarDouble(outputs: MLOutput, symbol: String): Double = 
-    getScalar(outputs, symbol).asInstanceOf[Double]
-    
-def getScalarInt(outputs: MLOutput, symbol: String): Int =
-    getScalarDouble(outputs, symbol).toInt
-{% endhighlight %}
-
-**Output:**
-{% highlight scala %}
-import org.apache.sysml.api.MLOutput
-getScalar: (outputs: org.apache.sysml.api.MLOutput, symbol: String)Any
-getScalarDouble: (outputs: org.apache.sysml.api.MLOutput, symbol: String)Double
-getScalarInt: (outputs: org.apache.sysml.api.MLOutput, symbol: String)Int
-{% endhighlight %}
-
-
-## Convert DataFrame to Binary-Block Format
-
-SystemML uses a binary-block format for matrix data representation. This cell
-explicitly converts the `DataFrame` `data` object to a binary-block `features` matrix
-and single-column `label` matrix, both represented by the
-`JavaPairRDD[MatrixIndexes, MatrixBlock]` datatype. 
-
-
-**Cell:**
-{% highlight scala %}
-// Imports
-import org.apache.sysml.api.MLContext
-import org.apache.sysml.runtime.instructions.spark.utils.{RDDConverterUtilsExt => RDDConverterUtils}
-import org.apache.sysml.runtime.matrix.MatrixCharacteristics;
-
-// Create SystemML context
-val ml = new MLContext(sc)
-
-// Convert data to proper format
-val mcX = new MatrixCharacteristics(numRows, numCols, 1000, 1000)
-val mcY = new MatrixCharacteristics(numRows, 1, 1000, 1000)
-val X = RDDConverterUtils.vectorDataFrameToBinaryBlock(sc, data, mcX, false, "features")
-val y = RDDConverterUtils.dataFrameToBinaryBlock(sc, data.select("label"), mcY, false)
-// val y = data.select("label")
-
-// Cache
-val X2 = X.cache()
-val y2 = y.cache()
-val cnt1 = X2.count()
-val cnt2 = y2.count()
-{% endhighlight %}
-
-**Output:**
-{% highlight scala %}
-import org.apache.sysml.api.MLContext
-import org.apache.sysml.runtime.instructions.spark.utils.{RDDConverterUtilsExt=>RDDConverterUtils}
-import org.apache.sysml.runtime.matrix.MatrixCharacteristics
-ml: org.apache.sysml.api.MLContext = org.apache.sysml.api.MLContext@38d59245
-mcX: org.apache.sysml.runtime.matrix.MatrixCharacteristics = [10000 x 1000, nnz=-1, blocks (1000 x 1000)]
-mcY: org.apache.sysml.runtime.matrix.MatrixCharacteristics = [10000 x 1, nnz=-1, blocks (1000 x 1000)]
-X: org.apache.spark.api.java.JavaPairRDD[org.apache.sysml.runtime.matrix.data.MatrixIndexes,org.apache.sysml.runtime.matrix.data.MatrixBlock] = org.apache.spark.api.java.JavaPairRDD@b5a86e3
-y: org.apache.spark.api.java.JavaPairRDD[org.apache.sysml.runtime.matrix.data.MatrixIndexes,org.apache.sysml.runtime.matrix.data.MatrixBlock] = org.apache.spark.api.java.JavaPairRDD@56377665
-X2: org.apache.spark.api.java.JavaPairRDD[org.apache.sysml.runtime.matrix.data.MatrixIndexes,org.apache.sysml.runtime.matrix.data.MatrixBlock] = org.apache.spark.api.java.JavaPairRDD@650f29d2
-y2: org.apache.spark.api.java.JavaPairRDD[org.apache.sysml.runtime.matrix.data.MatrixIndexes,org.apache.sysml.runtime.matrix.data.MatrixBlock] = org.apache.spark.api.java.JavaPairRDD@334857a8
-cnt1: Long = 10
-cnt2: Long = 10
-{% endhighlight %}
-
-
-## Train using SystemML Linear Regression Algorithm
-
-Now, we can train our model using the SystemML linear regression algorithm. We register the features matrix `X` and the label matrix `y` as inputs. We register the `beta_out` matrix,
-`R2`, and `totalIters` as outputs.
-
-**Cell:**
-{% highlight scala %}
-// Register inputs & outputs
-ml.reset()  
-ml.registerInput("X", X, numRows, numCols)
-ml.registerInput("y", y, numRows, 1)
-// ml.registerInput("y", y)
-ml.registerOutput("beta_out")
-ml.registerOutput("R2")
-ml.registerOutput("totalIters")
-
-// Run the script
-val start = System.currentTimeMillis()
-val outputs = ml.executeScript(linearReg)
-val trainingTime = (System.currentTimeMillis() - start).toDouble / 1000.0
-
-// Get outputs
-val B = outputs.getDF(sqlContext, "beta_out").sort("ID").drop("ID")
-val r2 = getScalarDouble(outputs, "R2")
-val iters = getScalarInt(outputs, "totalIters")
-val trainingTimePerIter = trainingTime / iters
-{% endhighlight %}
-
-**Output:**
-{% highlight scala %}
-start: Long = 1444672090620
-outputs: org.apache.sysml.api.MLOutput = org.apache.sysml.api.MLOutput@5d2c22d0
-trainingTime: Double = 1.176
-B: org.apache.spark.sql.DataFrame = [C1: double]
-r2: Double = 0.9677079547216473
-iters: Int = 12
-trainingTimePerIter: Double = 0.09799999999999999
-{% endhighlight %}
-
-
-## SystemML Linear Regression Summary Statistics
-
-SystemML linear regression summary statistics are displayed by this cell.
-
-**Cell:**
-{% highlight scala %}
-// Print statistics
-println(s"R2: ${r2}")
-println(s"Iterations: ${iters}")
-println(s"Training time per iter: ${trainingTimePerIter} seconds")
-B.describe().show()
-{% endhighlight %}
-
-**Output:**
-{% highlight scala %}
-R2: 0.9677079547216473
-Iterations: 12
-Training time per iter: 0.2334166666666667 seconds
-+-------+-------------------+
-|summary|                 C1|
-+-------+-------------------+
-|  count|               1000|
-|   mean| 0.0184500840658385|
-| stddev| 0.2764750319432085|
-|    min|-0.5426068958986378|
-|    max| 0.5225309861616542|
-+-------+-------------------+
-{% endhighlight %}
-
-
-

http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/65844aa6/docs/spark-mlcontext-programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/spark-mlcontext-programming-guide.md b/docs/spark-mlcontext-programming-guide.md
new file mode 100644
index 0000000..cc83d24
--- /dev/null
+++ b/docs/spark-mlcontext-programming-guide.md
@@ -0,0 +1,996 @@
+---
+layout: global
+title: Spark MLContext Programming Guide
+description: Spark MLContext Programming Guide
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+<br/>
+
+
+# Overview
+
+The Spark `MLContext` API offers a programmatic interface for interacting with SystemML from Spark using languages
+such as Scala and Java. When interacting with `MLContext` from Spark, `DataFrame`s and `RDD`s can be passed
+to SystemML. These data representations are converted to a
+binary-block data format, allowing for SystemML's optimizations to be performed.
+
+
+# Spark Shell (Scala) Example
+
+## Start Spark Shell with SystemML
+
+To use SystemML with the Spark Shell, the SystemML jar can be referenced using the Spark Shell's `--jars` option. 
+Instructions to build the SystemML jar can be found in the [SystemML GitHub README](https://github.com/apache/incubator-systemml).
+
+{% highlight bash %}
+./bin/spark-shell --executor-memory 4G --driver-memory 4G --jars SystemML.jar
+{% endhighlight %}
+
+Here is an example of Spark Shell with SystemML and YARN.
+
+{% highlight bash %}
+./bin/spark-shell --master yarn-client --num-executors 3 --driver-memory 5G --executor-memory 5G --executor-cores 4 --jars SystemML.jar
+{% endhighlight %}
+
+
+## Create MLContext
+
+An `MLContext` object can be created by passing its constructor a reference to the `SparkContext`.
+
+<div class="codetabs">
+
+<div data-lang="Spark Shell" markdown="1">
+{% highlight scala %}
+scala>import org.apache.sysml.api.MLContext
+import org.apache.sysml.api.MLContext
+
+scala> val ml = new MLContext(sc)
+ml: org.apache.sysml.api.MLContext = org.apache.sysml.api.MLContext@33e38c6b
+{% endhighlight %}
+</div>
+
+<div data-lang="Statements" markdown="1">
+{% highlight scala %}
+import org.apache.sysml.api.MLContext
+val ml = new MLContext(sc)
+{% endhighlight %}
+</div>
+
+</div>
+
+
+## Create DataFrame
+
+For demonstration purposes, we'll create a `DataFrame` consisting of 100,000 rows and 1,000 columns
+of random `double`s.
+
+<div class="codetabs">
+
+<div data-lang="Spark Shell" markdown="1">
+{% highlight scala %}
+scala> import org.apache.spark.sql._
+import org.apache.spark.sql._
+
+scala> import org.apache.spark.sql.types.{StructType,StructField,DoubleType}
+import org.apache.spark.sql.types.{StructType, StructField, DoubleType}
+
+scala> import scala.util.Random
+import scala.util.Random
+
+scala> val numRows = 100000
+numRows: Int = 100000
+
+scala> val numCols = 1000
+numCols: Int = 1000
+
+scala> val data = sc.parallelize(0 to numRows-1).map { _ => Row.fromSeq(Seq.fill(numCols)(Random.nextDouble)) }
+data: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = MapPartitionsRDD[1] at map at <console>:33
+
+scala> val schema = StructType((0 to numCols-1).map { i => StructField("C" + i, DoubleType, true) } )
+schema: org.apache.spark.sql.types.StructType = StructType(StructField(C0,DoubleType,true), StructField(C1,DoubleType,true), StructField(C2,DoubleType,true), StructField(C3,DoubleType,true), StructField(C4,DoubleType,true), StructField(C5,DoubleType,true), StructField(C6,DoubleType,true), StructField(C7,DoubleType,true), StructField(C8,DoubleType,true), StructField(C9,DoubleType,true), StructField(C10,DoubleType,true), StructField(C11,DoubleType,true), StructField(C12,DoubleType,true), StructField(C13,DoubleType,true), StructField(C14,DoubleType,true), StructField(C15,DoubleType,true), StructField(C16,DoubleType,true), StructField(C17,DoubleType,true), StructField(C18,DoubleType,true), StructField(C19,DoubleType,true), StructField(C20,DoubleType,true), StructField(C21,DoubleType,true), ...
+
+scala> val df = sqlContext.createDataFrame(data, schema)
+df: org.apache.spark.sql.DataFrame = [C0: double, C1: double, C2: double, C3: double, C4: double, C5: double, C6: double, C7: double, C8: double, C9: double, C10: double, C11: double, C12: double, C13: double, C14: double, C15: double, C16: double, C17: double, C18: double, C19: double, C20: double, C21: double, C22: double, C23: double, C24: double, C25: double, C26: double, C27: double, C28: double, C29: double, C30: double, C31: double, C32: double, C33: double, C34: double, C35: double, C36: double, C37: double, C38: double, C39: double, C40: double, C41: double, C42: double, C43: double, C44: double, C45: double, C46: double, C47: double, C48: double, C49: double, C50: double, C51: double, C52: double, C53: double, C54: double, C55: double, C56: double, C57: double, C58: double, C5...
+
+{% endhighlight %}
+</div>
+
+<div data-lang="Statements" markdown="1">
+{% highlight scala %}
+import org.apache.spark.sql._
+import org.apache.spark.sql.types.{StructType,StructField,DoubleType}
+import scala.util.Random
+val numRows = 100000
+val numCols = 1000
+val data = sc.parallelize(0 to numRows-1).map { _ => Row.fromSeq(Seq.fill(numCols)(Random.nextDouble)) }
+val schema = StructType((0 to numCols-1).map { i => StructField("C" + i, DoubleType, true) } )
+val df = sqlContext.createDataFrame(data, schema)
+{% endhighlight %}
+</div>
+
+</div>
+
+
+## Helper Methods
+
+For convenience, we'll create some helper methods. The SystemML output data is encapsulated in
+an `MLOutput` object. The `getScalar()` method extracts a scalar value from a `DataFrame` returned by
+`MLOutput`. The `getScalarDouble()` method returns such a value as a `Double`, and the
+`getScalarInt()` method returns such a value as an `Int`.
+
+<div class="codetabs">
+
+<div data-lang="Spark Shell" markdown="1">
+{% highlight scala %}
+scala> import org.apache.sysml.api.MLOutput
+import org.apache.sysml.api.MLOutput
+
+scala> def getScalar(outputs: MLOutput, symbol: String): Any =
+     | outputs.getDF(sqlContext, symbol).first()(1)
+getScalar: (outputs: org.apache.sysml.api.MLOutput, symbol: String)Any
+
+scala> def getScalarDouble(outputs: MLOutput, symbol: String): Double =
+     | getScalar(outputs, symbol).asInstanceOf[Double]
+getScalarDouble: (outputs: org.apache.sysml.api.MLOutput, symbol: String)Double
+
+scala> def getScalarInt(outputs: MLOutput, symbol: String): Int =
+     | getScalarDouble(outputs, symbol).toInt
+getScalarInt: (outputs: org.apache.sysml.api.MLOutput, symbol: String)Int
+
+{% endhighlight %}
+</div>
+
+<div data-lang="Statements" markdown="1">
+{% highlight scala %}
+import org.apache.sysml.api.MLOutput
+def getScalar(outputs: MLOutput, symbol: String): Any =
+outputs.getDF(sqlContext, symbol).first()(1)
+def getScalarDouble(outputs: MLOutput, symbol: String): Double =
+getScalar(outputs, symbol).asInstanceOf[Double]
+def getScalarInt(outputs: MLOutput, symbol: String): Int =
+getScalarDouble(outputs, symbol).toInt
+
+{% endhighlight %}
+</div>
+
+</div>
+
+
+## Convert DataFrame to Binary-Block Matrix
+
+SystemML is optimized to operate on a binary-block format for matrix representation. For large
+datasets, conversion from DataFrame to binary-block can require a significant quantity of time.
+Explicit DataFrame to binary-block conversion allows algorithm performance to be measured separately
+from data conversion time.
+
+The SystemML binary-block matrix representation can be thought of as a two-dimensional array of blocks, where each block
+consists of a number of rows and columns. In this example, we specify a matrix consisting
+of blocks of size 1000x1000. The experimental `dataFrameToBinaryBlock()` method of `RDDConverterUtilsExt` is used
+to convert the `DataFrame df` to a SystemML binary-block matrix, which is represented by the datatype
+`JavaPairRDD[MatrixIndexes, MatrixBlock]`.
+
+<div class="codetabs">
+
+<div data-lang="Spark Shell" markdown="1">
+{% highlight scala %}
+scala> import org.apache.sysml.runtime.instructions.spark.utils.{RDDConverterUtilsExt => RDDConverterUtils}
+import org.apache.sysml.runtime.instructions.spark.utils.{RDDConverterUtilsExt=>RDDConverterUtils}
+
+scala> import org.apache.sysml.runtime.matrix.MatrixCharacteristics;
+import org.apache.sysml.runtime.matrix.MatrixCharacteristics
+
+scala> val numRowsPerBlock = 1000
+numRowsPerBlock: Int = 1000
+
+scala> val numColsPerBlock = 1000
+numColsPerBlock: Int = 1000
+
+scala> val mc = new MatrixCharacteristics(numRows, numCols, numRowsPerBlock, numColsPerBlock)
+mc: org.apache.sysml.runtime.matrix.MatrixCharacteristics = [100000 x 1000, nnz=-1, blocks (1000 x 1000)]
+
+scala> val sysMlMatrix = RDDConverterUtils.dataFrameToBinaryBlock(sc, df, mc, false)
+sysMlMatrix: org.apache.spark.api.java.JavaPairRDD[org.apache.sysml.runtime.matrix.data.MatrixIndexes,org.apache.sysml.runtime.matrix.data.MatrixBlock] = org.apache.spark.api.java.JavaPairRDD@2bce3248
+
+{% endhighlight %}
+</div>
+
+<div data-lang="Statements" markdown="1">
+{% highlight scala %}
+import org.apache.sysml.runtime.instructions.spark.utils.{RDDConverterUtilsExt => RDDConverterUtils}
+import org.apache.sysml.runtime.matrix.MatrixCharacteristics;
+val numRowsPerBlock = 1000
+val numColsPerBlock = 1000
+val mc = new MatrixCharacteristics(numRows, numCols, numRowsPerBlock, numColsPerBlock)
+val sysMlMatrix = RDDConverterUtils.dataFrameToBinaryBlock(sc, df, mc, false)
+
+{% endhighlight %}
+</div>
+
+</div>
+
+
+## DML Script
+
+For this example, we will utilize the following DML Script called `shape.dml` that reads in a matrix and outputs the number of rows and the
+number of columns, each represented as a matrix.
+
+{% highlight r %}
+X = read($Xin)
+m = matrix(nrow(X), rows=1, cols=1)
+n = matrix(ncol(X), rows=1, cols=1)
+write(m, $Mout)
+write(n, $Nout)
+{% endhighlight %}
+
+
+## Execute Script
+
+Let's execute our DML script, as shown in the example below. The call to `reset()` of `MLContext` is not necessary here, but this method should
+be called if you need to reset inputs and outputs or if you would like to call `execute()` with a different script.
+
+An example of registering the `DataFrame df` as an input to the `X` variable is shown but commented out. If a DataFrame is registered directly,
+it will implicitly be converted to SystemML's binary-block format. However, since we've already explicitly converted the DataFrame to the
+binary-block fixed variable `systemMlMatrix`, we will register this input to the `X` variable. We register the `m` and `n` variables
+as outputs.
+
+When SystemML is executed via `DMLScript` (such as in Standalone Mode), inputs are supplied as either command-line named arguments
+or positional argument. These inputs are specified in DML scripts by prepending them with a `$`. Values are read from or written
+to files using `read`/`write` (DML) and `load`/`save` (PyDML) statements. When utilizing the `MLContext` API,
+inputs and outputs can be other data representations, such as `DataFrame`s. The input and output data are bound to DML variables.
+The named arguments in the `shape.dml` script do not have default values set for them, so we create a `Map` to map the required named
+arguments to blank `String`s so that the script can pass validation.
+
+The `shape.dml` script is executed by the call to `execute()`, where we supply the `Map` of required named arguments. The
+execution results are returned as the `MLOutput` fixed variable `outputs`. The number of rows is obtained by calling the `getStaticInt()`
+helper method with the `outputs` object and `"m"`. The number of columns is retrieved by calling `getStaticInt()` with 
+`outputs` and `"n"`.
+
+<div class="codetabs">
+
+<div data-lang="Spark Shell" markdown="1">
+{% highlight scala %}
+scala> ml.reset()
+
+scala> //ml.registerInput("X", df) // implicit conversion of DataFrame to binary-block
+
+scala> ml.registerInput("X", sysMlMatrix, numRows, numCols)
+
+scala> ml.registerOutput("m")
+
+scala> ml.registerOutput("n")
+
+scala> val nargs = Map("Xin" -> " ", "Mout" -> " ", "Nout" -> " ")
+nargs: scala.collection.immutable.Map[String,String] = Map(Xin -> " ", Mout -> " ", Nout -> " ")
+
+scala> val outputs = ml.execute("shape.dml", nargs)
+15/10/12 16:29:15 WARN : Your hostname, derons-mbp.usca.ibm.com resolves to a loopback/non-reachable address: 127.0.0.1, but we couldn't find any external IP address!
+15/10/12 16:29:15 WARN OptimizerUtils: Auto-disable multi-threaded text read for 'text' and 'csv' due to thread contention on JRE < 1.8 (java.version=1.7.0_80).
+outputs: org.apache.sysml.api.MLOutput = org.apache.sysml.api.MLOutput@4d424743
+
+scala> val m = getScalarInt(outputs, "m")
+m: Int = 100000
+
+scala> val n = getScalarInt(outputs, "n")
+n: Int = 1000
+
+{% endhighlight %}
+</div>
+
+<div data-lang="Statements" markdown="1">
+{% highlight scala %}
+ml.reset()
+//ml.registerInput("X", df) // implicit conversion of DataFrame to binary-block
+ml.registerInput("X", sysMlMatrix, numRows, numCols)
+ml.registerOutput("m")
+ml.registerOutput("n")
+val nargs = Map("Xin" -> " ", "Mout" -> " ", "Nout" -> " ")
+val outputs = ml.execute("shape.dml", nargs)
+val m = getScalarInt(outputs, "m")
+val n = getScalarInt(outputs, "n")
+
+{% endhighlight %}
+</div>
+
+</div>
+
+
+## DML Script as String
+
+The `MLContext` API allows a DML script to be specified
+as a `String`. Here, we specify a DML script as a fixed `String` variable called `minMaxMeanScript`.
+This DML will find the minimum, maximum, and mean value of a matrix.
+
+<div class="codetabs">
+
+<div data-lang="Spark Shell" markdown="1">
+{% highlight scala %}
+scala> val minMaxMeanScript: String = 
+     | """
+     | Xin = read(" ")
+     | minOut = matrix(min(Xin), rows=1, cols=1)
+     | maxOut = matrix(max(Xin), rows=1, cols=1)
+     | meanOut = matrix(mean(Xin), rows=1, cols=1)
+     | write(minOut, " ")
+     | write(maxOut, " ")
+     | write(meanOut, " ")
+     | """
+minMaxMeanScript: String = 
+"
+Xin = read(" ")
+minOut = matrix(min(Xin), rows=1, cols=1)
+maxOut = matrix(max(Xin), rows=1, cols=1)
+meanOut = matrix(mean(Xin), rows=1, cols=1)
+write(minOut, " ")
+write(maxOut, " ")
+write(meanOut, " ")
+"
+
+{% endhighlight %}
+</div>
+
+<div data-lang="Statements" markdown="1">
+{% highlight scala %}
+val minMaxMeanScript: String = 
+"""
+Xin = read(" ")
+minOut = matrix(min(Xin), rows=1, cols=1)
+maxOut = matrix(max(Xin), rows=1, cols=1)
+meanOut = matrix(mean(Xin), rows=1, cols=1)
+write(minOut, " ")
+write(maxOut, " ")
+write(meanOut, " ")
+"""
+
+{% endhighlight %}
+</div>
+
+</div>
+
+## Scala Wrapper for DML
+
+We can create a Scala wrapper for our invocation of the `minMaxMeanScript` DML `String`. The `minMaxMean()` method
+takes a `JavaPairRDD[MatrixIndexes, MatrixBlock]` parameter, which is a SystemML binary-block matrix representation.
+It also takes a `rows` parameter indicating the number of rows in the matrix, a `cols` parameter indicating the number
+of columns in the matrix, and an `MLContext` parameter. The `minMaxMean()` method
+returns a tuple consisting of the minimum value in the matrix, the maximum value in the matrix, and the computed
+mean value of the matrix.
+
+<div class="codetabs">
+
+<div data-lang="Spark Shell" markdown="1">
+{% highlight scala %}
+scala> import org.apache.sysml.runtime.matrix.data.MatrixIndexes
+import org.apache.sysml.runtime.matrix.data.MatrixIndexes
+
+scala> import org.apache.sysml.runtime.matrix.data.MatrixBlock
+import org.apache.sysml.runtime.matrix.data.MatrixBlock
+
+scala> import org.apache.spark.api.java.JavaPairRDD
+import org.apache.spark.api.java.JavaPairRDD
+
+scala> def minMaxMean(mat: JavaPairRDD[MatrixIndexes, MatrixBlock], rows: Int, cols: Int, ml: MLContext): (Double, Double, Double) = {
+     | ml.reset()
+     | ml.registerInput("Xin", mat, rows, cols)
+     | ml.registerOutput("minOut")
+     | ml.registerOutput("maxOut")
+     | ml.registerOutput("meanOut")
+     | val outputs = ml.executeScript(minMaxMeanScript)
+     | val minOut = getScalarDouble(outputs, "minOut")
+     | val maxOut = getScalarDouble(outputs, "maxOut")
+     | val meanOut = getScalarDouble(outputs, "meanOut")
+     | (minOut, maxOut, meanOut)
+     | }
+minMaxMean: (mat: org.apache.spark.api.java.JavaPairRDD[org.apache.sysml.runtime.matrix.data.MatrixIndexes,org.apache.sysml.runtime.matrix.data.MatrixBlock], rows: Int, cols: Int, ml: org.apache.sysml.api.MLContext)(Double, Double, Double)
+
+{% endhighlight %}
+</div>
+
+<div data-lang="Statements" markdown="1">
+{% highlight scala %}
+import org.apache.sysml.runtime.matrix.data.MatrixIndexes
+import org.apache.sysml.runtime.matrix.data.MatrixBlock
+import org.apache.spark.api.java.JavaPairRDD
+def minMaxMean(mat: JavaPairRDD[MatrixIndexes, MatrixBlock], rows: Int, cols: Int, ml: MLContext): (Double, Double, Double) = {
+ml.reset()
+ml.registerInput("Xin", mat, rows, cols)
+ml.registerOutput("minOut")
+ml.registerOutput("maxOut")
+ml.registerOutput("meanOut")
+val outputs = ml.executeScript(minMaxMeanScript)
+val minOut = getScalarDouble(outputs, "minOut")
+val maxOut = getScalarDouble(outputs, "maxOut")
+val meanOut = getScalarDouble(outputs, "meanOut")
+(minOut, maxOut, meanOut)
+}
+
+{% endhighlight %}
+</div>
+
+</div>
+
+
+## Invoking DML via Scala Wrapper
+
+Here, we invoke `minMaxMeanScript` using our `minMaxMean()` Scala wrapper method. It returns a tuple
+consisting of the minimum value in the matrix, the maximum value in the matrix, and the mean value of the matrix.
+
+<div class="codetabs">
+
+<div data-lang="Spark Shell" markdown="1">
+{% highlight scala %}
+scala> val (min, max, mean) = minMaxMean(sysMlMatrix, numRows, numCols, ml)
+15/10/13 14:33:11 WARN OptimizerUtils: Auto-disable multi-threaded text read for 'text' and 'csv' due to thread contention on JRE < 1.8 (java.version=1.7.0_80).
+min: Double = 5.378949397005783E-9                                              
+max: Double = 0.9999999934660398
+mean: Double = 0.499988222338507
+
+{% endhighlight %}
+</div>
+
+<div data-lang="Statements" markdown="1">
+{% highlight scala %}
+val (min, max, mean) = minMaxMean(sysMlMatrix, numRows, numCols, ml)
+
+{% endhighlight %}
+</div>
+
+</div>
+
+
+* * *
+
+# Java Example
+
+Next, let's consider a Java example. The `MLContextExample` class creates an `MLContext` object from a `JavaSparkContext`.
+Next, it reads in a matrix CSV file as a `JavaRDD<String>` object. It registers this as input `X`. It registers
+two outputs, `m` and `n`. A `HashMap` maps the expected command-line arguments of the `shape.dml` script to spaces so that
+it passes validation. The `shape.dml` script is executed, and the number of rows and columns in the matrix are output
+to standard output.
+
+
+{% highlight java %}
+package org.apache.sysml;
+
+import java.util.HashMap;
+
+import org.apache.spark.SparkConf;
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.DataFrame;
+import org.apache.spark.sql.SQLContext;
+
+import org.apache.sysml.api.MLContext;
+import org.apache.sysml.api.MLOutput;
+
+public class MLContextExample {
+
+	public static void main(String[] args) throws Exception {
+
+		SparkConf conf = new SparkConf().setAppName("MLContextExample").setMaster("local");
+		JavaSparkContext sc = new JavaSparkContext(conf);
+		SQLContext sqlContext = new SQLContext(sc);
+		MLContext ml = new MLContext(sc);
+
+		JavaRDD<String> csv = sc.textFile("A.csv");
+		ml.registerInput("X", csv, "csv");
+		ml.registerOutput("m");
+		ml.registerOutput("n");
+		HashMap<String, String> cmdLineArgs = new HashMap<String, String>();
+		cmdLineArgs.put("X", " ");
+		cmdLineArgs.put("m", " ");
+		cmdLineArgs.put("n", " ");
+		MLOutput output = ml.execute("shape.dml", cmdLineArgs);
+		DataFrame mDf = output.getDF(sqlContext, "m");
+		DataFrame nDf = output.getDF(sqlContext, "n");
+		System.out.println("rows:" + mDf.first().getDouble(1));
+		System.out.println("cols:" + nDf.first().getDouble(1));
+	}
+
+}
+
+
+{% endhighlight %}
+
+
+* * *
+
+# Zeppelin Notebook Example - Linear Regression Algorithm
+
+Next, we'll consider an example of a SystemML linear regression algorithm run from Spark through an Apache Zeppelin notebook.
+Instructions to clone and build Zeppelin can be found at the [GitHub Apache Zeppelin](https://github.com/apache/incubator-zeppelin)
+site. This example also will look at the Spark ML linear regression algorithm.
+
+This Zeppelin notebook example can be downloaded [here](files/spark-mlcontext-programming-guide/zeppelin-notebook-linear-regression/2AZ2AQ12B.tar.gz).
+Once downloaded and unzipped, place the folder in the Zeppelin `notebook` directory.
+
+A `conf/zeppelin-env.sh` file is created based on `conf/zeppelin-env.sh.template`. For
+this demonstration, it features `SPARK_HOME`, `SPARK_SUBMIT_OPTIONS`, and `ZEPPELIN_SPARK_USEHIVECONTEXT`
+environment variables:
+
+	export SPARK_HOME=/Users/example/spark-1.5.1-bin-hadoop2.6
+	export SPARK_SUBMIT_OPTIONS="--jars $/Users/example/systemml/system-ml/target/SystemML.jar"
+	export ZEPPELIN_SPARK_USEHIVECONTEXT=false
+
+Start Zeppelin using the `zeppelin.sh` script:
+
+	bin/zeppelin.sh
+
+After opening Zeppelin in a brower, we see the "SystemML - Linear Regression" note in the list of available
+Zeppelin notes.
+
+![Zeppelin Notebook](img/spark-mlcontext-programming-guide/zeppelin-notebook.png "Zeppelin Notebook")
+
+If we go to the "SystemML - Linear Regression" note, we see that the note consists of several cells of code.
+
+![Zeppelin 'SystemML - Linear Regression' Note](img/spark-mlcontext-programming-guide/zeppelin-notebook-systemml-linear-regression.png "Zeppelin 'SystemML - Linear Regression' Note")
+
+Let's briefly consider these cells.
+
+## Trigger Spark Startup
+
+This cell triggers Spark to initialize by calling the `SparkContext` `sc` object. Information regarding these startup operations can be viewed in the 
+console window in which `zeppelin.sh` is running.
+
+**Cell:**
+{% highlight scala %}
+// Trigger Spark Startup
+sc
+{% endhighlight %}
+
+**Output:**
+{% highlight scala %}
+res8: org.apache.spark.SparkContext = org.apache.spark.SparkContext@6ce70bf3
+{% endhighlight %}
+
+
+## Generate Linear Regression Test Data
+
+The Spark `LinearDataGenerator` is used to generate test data for the Spark ML and SystemML linear regression algorithms.
+
+**Cell:**
+{% highlight scala %}
+// Generate data
+import org.apache.spark.mllib.util.LinearDataGenerator
+
+val numRows = 10000
+val numCols = 1000
+val rawData = LinearDataGenerator.generateLinearRDD(sc, numRows, numCols, 1).toDF()
+
+// Repartition into a more parallelism-friendly number of partitions
+val data = rawData.repartition(64).cache()
+{% endhighlight %}
+
+**Output:**
+{% highlight scala %}
+import org.apache.spark.mllib.util.LinearDataGenerator
+numRows: Int = 10000
+numCols: Int = 1000
+rawData: org.apache.spark.sql.DataFrame = [label: double, features: vector]
+data: org.apache.spark.sql.DataFrame = [label: double, features: vector]
+{% endhighlight %}
+
+
+## Train using Spark ML Linear Regression Algorithm for Comparison
+
+For purpose of comparison, we can train a model using the Spark ML linear regression
+algorithm.
+
+**Cell:**
+{% highlight scala %}
+// Spark ML
+import org.apache.spark.ml.regression.LinearRegression
+
+// Model Settings
+val maxIters = 100
+val reg = 0
+val elasticNetParam = 0  // L2 reg
+
+// Fit the model
+val lr = new LinearRegression()
+  .setMaxIter(maxIters)
+  .setRegParam(reg)
+  .setElasticNetParam(elasticNetParam)
+val start = System.currentTimeMillis()
+val model = lr.fit(data)
+val trainingTime = (System.currentTimeMillis() - start).toDouble / 1000.0
+
+// Summarize the model over the training set and gather some metrics
+val trainingSummary = model.summary
+val r2 = trainingSummary.r2
+val iters = trainingSummary.totalIterations
+val trainingTimePerIter = trainingTime / iters
+{% endhighlight %}
+
+**Output:**
+{% highlight scala %}
+import org.apache.spark.ml.regression.LinearRegression
+maxIters: Int = 100
+reg: Int = 0
+elasticNetParam: Int = 0
+lr: org.apache.spark.ml.regression.LinearRegression = linReg_a7f51d676562
+start: Long = 1444672044647
+model: org.apache.spark.ml.regression.LinearRegressionModel = linReg_a7f51d676562
+trainingTime: Double = 12.985
+trainingSummary: org.apache.spark.ml.regression.LinearRegressionTrainingSummary = org.apache.spark.ml.regression.LinearRegressionTrainingSummary@227ba28b
+r2: Double = 0.9677118209276552
+iters: Int = 17
+trainingTimePerIter: Double = 0.7638235294117647
+{% endhighlight %}
+
+
+## Spark ML Linear Regression Summary Statistics
+
+Summary statistics for the Spark ML linear regression algorithm are displayed by this cell.
+
+**Cell:**
+{% highlight scala %}
+// Print statistics
+println(s"R2: ${r2}")
+println(s"Iterations: ${iters}")
+println(s"Training time per iter: ${trainingTimePerIter} seconds")
+{% endhighlight %}
+
+**Output:**
+{% highlight scala %}
+R2: 0.9677118209276552
+Iterations: 17
+Training time per iter: 0.7638235294117647 seconds
+{% endhighlight %}
+
+
+## SystemML Linear Regression Algorithm
+
+The `linearReg` fixed `String` variable is set to
+a linear regression algorithm written in DML, SystemML's Declarative Machine Learning language.
+
+
+
+**Cell:**
+{% highlight scala %}
+// SystemML kernels
+val linearReg =
+"""
+#
+# THIS SCRIPT SOLVES LINEAR REGRESSION USING THE CONJUGATE GRADIENT ALGORITHM
+#
+# INPUT PARAMETERS:
+# --------------------------------------------------------------------------------------------
+# NAME  TYPE   DEFAULT  MEANING
+# --------------------------------------------------------------------------------------------
+# X     String  ---     Matrix X of feature vectors
+# Y     String  ---     1-column Matrix Y of response values
+# icpt  Int      0      Intercept presence, shifting and rescaling the columns of X:
+#                       0 = no intercept, no shifting, no rescaling;
+#                       1 = add intercept, but neither shift nor rescale X;
+#                       2 = add intercept, shift & rescale X columns to mean = 0, variance = 1
+# reg   Double 0.000001 Regularization constant (lambda) for L2-regularization; set to nonzero
+#                       for highly dependend/sparse/numerous features
+# tol   Double 0.000001 Tolerance (epsilon); conjugate graduent procedure terminates early if
+#                       L2 norm of the beta-residual is less than tolerance * its initial norm
+# maxi  Int      0      Maximum number of conjugate gradient iterations, 0 = no maximum
+# --------------------------------------------------------------------------------------------
+#
+# OUTPUT:
+# B Estimated regression parameters (the betas) to store
+#
+# Note: Matrix of regression parameters (the betas) and its size depend on icpt input value:
+#         OUTPUT SIZE:   OUTPUT CONTENTS:                HOW TO PREDICT Y FROM X AND B:
+# icpt=0: ncol(X)   x 1  Betas for X only                Y ~ X %*% B[1:ncol(X), 1], or just X %*% B
+# icpt=1: ncol(X)+1 x 1  Betas for X and intercept       Y ~ X %*% B[1:ncol(X), 1] + B[ncol(X)+1, 1]
+# icpt=2: ncol(X)+1 x 2  Col.1: betas for X & intercept  Y ~ X %*% B[1:ncol(X), 1] + B[ncol(X)+1, 1]
+#                        Col.2: betas for shifted/rescaled X and intercept
+#
+
+fileX = "";
+fileY = "";
+fileB = "";
+
+intercept_status = ifdef ($icpt, 0);     # $icpt=0;
+tolerance = ifdef ($tol, 0.000001);      # $tol=0.000001;
+max_iteration = ifdef ($maxi, 0);        # $maxi=0;
+regularization = ifdef ($reg, 0.000001); # $reg=0.000001;
+
+X = read (fileX);
+y = read (fileY);
+
+n = nrow (X);
+m = ncol (X);
+ones_n = matrix (1, rows = n, cols = 1);
+zero_cell = matrix (0, rows = 1, cols = 1);
+
+# Introduce the intercept, shift and rescale the columns of X if needed
+
+m_ext = m;
+if (intercept_status == 1 | intercept_status == 2)  # add the intercept column
+{
+    X = append (X, ones_n);
+    m_ext = ncol (X);
+}
+
+scale_lambda = matrix (1, rows = m_ext, cols = 1);
+if (intercept_status == 1 | intercept_status == 2)
+{
+    scale_lambda [m_ext, 1] = 0;
+}
+
+if (intercept_status == 2)  # scale-&-shift X columns to mean 0, variance 1
+{                           # Important assumption: X [, m_ext] = ones_n
+    avg_X_cols = t(colSums(X)) / n;
+    var_X_cols = (t(colSums (X ^ 2)) - n * (avg_X_cols ^ 2)) / (n - 1);
+    is_unsafe = ppred (var_X_cols, 0.0, "<=");
+    scale_X = 1.0 / sqrt (var_X_cols * (1 - is_unsafe) + is_unsafe);
+    scale_X [m_ext, 1] = 1;
+    shift_X = - avg_X_cols * scale_X;
+    shift_X [m_ext, 1] = 0;
+} else {
+    scale_X = matrix (1, rows = m_ext, cols = 1);
+    shift_X = matrix (0, rows = m_ext, cols = 1);
+}
+
+# Henceforth, if intercept_status == 2, we use "X %*% (SHIFT/SCALE TRANSFORM)"
+# instead of "X".  However, in order to preserve the sparsity of X,
+# we apply the transform associatively to some other part of the expression
+# in which it occurs.  To avoid materializing a large matrix, we rewrite it:
+#
+# ssX_A  = (SHIFT/SCALE TRANSFORM) %*% A    --- is rewritten as:
+# ssX_A  = diag (scale_X) %*% A;
+# ssX_A [m_ext, ] = ssX_A [m_ext, ] + t(shift_X) %*% A;
+#
+# tssX_A = t(SHIFT/SCALE TRANSFORM) %*% A   --- is rewritten as:
+# tssX_A = diag (scale_X) %*% A + shift_X %*% A [m_ext, ];
+
+lambda = scale_lambda * regularization;
+beta_unscaled = matrix (0, rows = m_ext, cols = 1);
+
+if (max_iteration == 0) {
+    max_iteration = m_ext;
+}
+i = 0;
+
+# BEGIN THE CONJUGATE GRADIENT ALGORITHM
+r = - t(X) %*% y;
+
+if (intercept_status == 2) {
+    r = scale_X * r + shift_X %*% r [m_ext, ];
+}
+
+p = - r;
+norm_r2 = sum (r ^ 2);
+norm_r2_initial = norm_r2;
+norm_r2_target = norm_r2_initial * tolerance ^ 2;
+
+while (i < max_iteration & norm_r2 > norm_r2_target)
+{
+    if (intercept_status == 2) {
+        ssX_p = scale_X * p;
+        ssX_p [m_ext, ] = ssX_p [m_ext, ] + t(shift_X) %*% p;
+    } else {
+        ssX_p = p;
+    }
+
+    q = t(X) %*% (X %*% ssX_p);
+
+    if (intercept_status == 2) {
+        q = scale_X * q + shift_X %*% q [m_ext, ];
+    }
+
+    q = q + lambda * p;
+    a = norm_r2 / sum (p * q);
+    beta_unscaled = beta_unscaled + a * p;
+    r = r + a * q;
+    old_norm_r2 = norm_r2;
+    norm_r2 = sum (r ^ 2);
+    p = -r + (norm_r2 / old_norm_r2) * p;
+    i = i + 1;
+}
+# END THE CONJUGATE GRADIENT ALGORITHM
+
+if (intercept_status == 2) {
+    beta = scale_X * beta_unscaled;
+    beta [m_ext, ] = beta [m_ext, ] + t(shift_X) %*% beta_unscaled;
+} else {
+    beta = beta_unscaled;
+}
+
+# Output statistics
+avg_tot = sum (y) / n;
+ss_tot = sum (y ^ 2);
+ss_avg_tot = ss_tot - n * avg_tot ^ 2;
+var_tot = ss_avg_tot / (n - 1);
+y_residual = y - X %*% beta;
+avg_res = sum (y_residual) / n;
+ss_res = sum (y_residual ^ 2);
+ss_avg_res = ss_res - n * avg_res ^ 2;
+
+R2_temp = 1 - ss_res / ss_avg_tot
+R2 = matrix(R2_temp, rows=1, cols=1)
+write(R2, "")
+
+totalIters = matrix(i, rows=1, cols=1)
+write(totalIters, "")
+
+# Prepare the output matrix
+if (intercept_status == 2) {
+    beta_out = append (beta, beta_unscaled);
+} else {
+    beta_out = beta;
+}
+
+write (beta_out, fileB);
+"""
+{% endhighlight %}
+
+**Output:**
+
+None
+
+
+## Helper Methods
+
+This cell contains helper methods to return `Double` and `Int` values from output generated by the `MLContext` API.
+
+**Cell:**
+{% highlight scala %}
+// Helper functions
+import org.apache.sysml.api.MLOutput
+
+def getScalar(outputs: MLOutput, symbol: String): Any =
+    outputs.getDF(sqlContext, symbol).first()(1)
+    
+def getScalarDouble(outputs: MLOutput, symbol: String): Double = 
+    getScalar(outputs, symbol).asInstanceOf[Double]
+    
+def getScalarInt(outputs: MLOutput, symbol: String): Int =
+    getScalarDouble(outputs, symbol).toInt
+{% endhighlight %}
+
+**Output:**
+{% highlight scala %}
+import org.apache.sysml.api.MLOutput
+getScalar: (outputs: org.apache.sysml.api.MLOutput, symbol: String)Any
+getScalarDouble: (outputs: org.apache.sysml.api.MLOutput, symbol: String)Double
+getScalarInt: (outputs: org.apache.sysml.api.MLOutput, symbol: String)Int
+{% endhighlight %}
+
+
+## Convert DataFrame to Binary-Block Format
+
+SystemML uses a binary-block format for matrix data representation. This cell
+explicitly converts the `DataFrame` `data` object to a binary-block `features` matrix
+and single-column `label` matrix, both represented by the
+`JavaPairRDD[MatrixIndexes, MatrixBlock]` datatype. 
+
+
+**Cell:**
+{% highlight scala %}
+// Imports
+import org.apache.sysml.api.MLContext
+import org.apache.sysml.runtime.instructions.spark.utils.{RDDConverterUtilsExt => RDDConverterUtils}
+import org.apache.sysml.runtime.matrix.MatrixCharacteristics;
+
+// Create SystemML context
+val ml = new MLContext(sc)
+
+// Convert data to proper format
+val mcX = new MatrixCharacteristics(numRows, numCols, 1000, 1000)
+val mcY = new MatrixCharacteristics(numRows, 1, 1000, 1000)
+val X = RDDConverterUtils.vectorDataFrameToBinaryBlock(sc, data, mcX, false, "features")
+val y = RDDConverterUtils.dataFrameToBinaryBlock(sc, data.select("label"), mcY, false)
+// val y = data.select("label")
+
+// Cache
+val X2 = X.cache()
+val y2 = y.cache()
+val cnt1 = X2.count()
+val cnt2 = y2.count()
+{% endhighlight %}
+
+**Output:**
+{% highlight scala %}
+import org.apache.sysml.api.MLContext
+import org.apache.sysml.runtime.instructions.spark.utils.{RDDConverterUtilsExt=>RDDConverterUtils}
+import org.apache.sysml.runtime.matrix.MatrixCharacteristics
+ml: org.apache.sysml.api.MLContext = org.apache.sysml.api.MLContext@38d59245
+mcX: org.apache.sysml.runtime.matrix.MatrixCharacteristics = [10000 x 1000, nnz=-1, blocks (1000 x 1000)]
+mcY: org.apache.sysml.runtime.matrix.MatrixCharacteristics = [10000 x 1, nnz=-1, blocks (1000 x 1000)]
+X: org.apache.spark.api.java.JavaPairRDD[org.apache.sysml.runtime.matrix.data.MatrixIndexes,org.apache.sysml.runtime.matrix.data.MatrixBlock] = org.apache.spark.api.java.JavaPairRDD@b5a86e3
+y: org.apache.spark.api.java.JavaPairRDD[org.apache.sysml.runtime.matrix.data.MatrixIndexes,org.apache.sysml.runtime.matrix.data.MatrixBlock] = org.apache.spark.api.java.JavaPairRDD@56377665
+X2: org.apache.spark.api.java.JavaPairRDD[org.apache.sysml.runtime.matrix.data.MatrixIndexes,org.apache.sysml.runtime.matrix.data.MatrixBlock] = org.apache.spark.api.java.JavaPairRDD@650f29d2
+y2: org.apache.spark.api.java.JavaPairRDD[org.apache.sysml.runtime.matrix.data.MatrixIndexes,org.apache.sysml.runtime.matrix.data.MatrixBlock] = org.apache.spark.api.java.JavaPairRDD@334857a8
+cnt1: Long = 10
+cnt2: Long = 10
+{% endhighlight %}
+
+
+## Train using SystemML Linear Regression Algorithm
+
+Now, we can train our model using the SystemML linear regression algorithm. We register the features matrix `X` and the label matrix `y` as inputs. We register the `beta_out` matrix,
+`R2`, and `totalIters` as outputs.
+
+**Cell:**
+{% highlight scala %}
+// Register inputs & outputs
+ml.reset()  
+ml.registerInput("X", X, numRows, numCols)
+ml.registerInput("y", y, numRows, 1)
+// ml.registerInput("y", y)
+ml.registerOutput("beta_out")
+ml.registerOutput("R2")
+ml.registerOutput("totalIters")
+
+// Run the script
+val start = System.currentTimeMillis()
+val outputs = ml.executeScript(linearReg)
+val trainingTime = (System.currentTimeMillis() - start).toDouble / 1000.0
+
+// Get outputs
+val B = outputs.getDF(sqlContext, "beta_out").sort("ID").drop("ID")
+val r2 = getScalarDouble(outputs, "R2")
+val iters = getScalarInt(outputs, "totalIters")
+val trainingTimePerIter = trainingTime / iters
+{% endhighlight %}
+
+**Output:**
+{% highlight scala %}
+start: Long = 1444672090620
+outputs: org.apache.sysml.api.MLOutput = org.apache.sysml.api.MLOutput@5d2c22d0
+trainingTime: Double = 1.176
+B: org.apache.spark.sql.DataFrame = [C1: double]
+r2: Double = 0.9677079547216473
+iters: Int = 12
+trainingTimePerIter: Double = 0.09799999999999999
+{% endhighlight %}
+
+
+## SystemML Linear Regression Summary Statistics
+
+SystemML linear regression summary statistics are displayed by this cell.
+
+**Cell:**
+{% highlight scala %}
+// Print statistics
+println(s"R2: ${r2}")
+println(s"Iterations: ${iters}")
+println(s"Training time per iter: ${trainingTimePerIter} seconds")
+B.describe().show()
+{% endhighlight %}
+
+**Output:**
+{% highlight scala %}
+R2: 0.9677079547216473
+Iterations: 12
+Training time per iter: 0.2334166666666667 seconds
++-------+-------------------+
+|summary|                 C1|
++-------+-------------------+
+|  count|               1000|
+|   mean| 0.0184500840658385|
+| stddev| 0.2764750319432085|
+|    min|-0.5426068958986378|
+|    max| 0.5225309861616542|
++-------+-------------------+
+{% endhighlight %}
+
+
+


[2/2] incubator-systemml git commit: Improve structure of doc presentation

Posted by de...@apache.org.
Improve structure of doc presentation

Update index.md and global.html to present available docs
in a more structured format, while additionally conveying
more information in index.md about the capabilities to be
found in the existing docs.
Add Apache license to hadoop-batch-mode.md.
Change MLContext to Spark MLContext on index.
Change ordering of execution modes.
Rename DML and PyDML Programming Guide to Beginner's Guide.
Include mention of Scala where relevant.
Rename MLContext Programming Guide to include Spark.
Update image paths.
Minor text updates.
Add analytics.

Closes #34.


Project: http://git-wip-us.apache.org/repos/asf/incubator-systemml/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-systemml/commit/65844aa6
Tree: http://git-wip-us.apache.org/repos/asf/incubator-systemml/tree/65844aa6
Diff: http://git-wip-us.apache.org/repos/asf/incubator-systemml/diff/65844aa6

Branch: refs/heads/master
Commit: 65844aa6ddcde223a46a08a87ffd235165108648
Parents: 52fae50
Author: Deron Eriksson <de...@us.ibm.com>
Authored: Tue Jan 12 13:44:25 2016 -0800
Committer: Deron Eriksson <de...@us.ibm.com>
Committed: Tue Jan 12 13:44:25 2016 -0800

----------------------------------------------------------------------
 docs/_config.yml                                |   5 +
 docs/_layouts/global.html                       |  33 +-
 docs/beginners-guide-to-dml-and-pydml.md        | 889 +++++++++++++++++
 docs/dml-and-pydml-programming-guide.md         | 889 -----------------
 docs/hadoop-batch-mode.md                       |  18 +
 ...bug-configuration-hello-world-main-class.png | Bin 0 -> 95393 bytes
 ...figuration-hello-world-program-arguments.png | Bin 0 -> 85243 bytes
 ...bug-configuration-hello-world-main-class.png | Bin 95393 -> 0 bytes
 ...figuration-hello-world-program-arguments.png | Bin 85243 -> 0 bytes
 ...elin-notebook-systemml-linear-regression.png | Bin 158819 -> 0 bytes
 .../zeppelin-notebook.png                       | Bin 152132 -> 0 bytes
 ...elin-notebook-systemml-linear-regression.png | Bin 0 -> 158819 bytes
 .../zeppelin-notebook.png                       | Bin 0 -> 152132 bytes
 docs/index.md                                   |  68 +-
 docs/mlcontext-programming-guide.md             | 996 -------------------
 docs/spark-mlcontext-programming-guide.md       | 996 +++++++++++++++++++
 16 files changed, 1983 insertions(+), 1911 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/65844aa6/docs/_config.yml
----------------------------------------------------------------------
diff --git a/docs/_config.yml b/docs/_config.yml
index 0419bff..26ddc8a 100644
--- a/docs/_config.yml
+++ b/docs/_config.yml
@@ -14,3 +14,8 @@ include:
 
 # These allow the documentation to be updated with newer releases
 SYSTEMML_VERSION: 0.8.0
+
+# if 'analytics_on' is true, analytics section will be rendered on the HTML pages
+analytics_on: true
+analytics_provider: google_universal
+analytics_google_universal_tracking_id : UA-71553733-1

http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/65844aa6/docs/_layouts/global.html
----------------------------------------------------------------------
diff --git a/docs/_layouts/global.html b/docs/_layouts/global.html
index e6f6eed..89bb168 100644
--- a/docs/_layouts/global.html
+++ b/docs/_layouts/global.html
@@ -31,20 +31,26 @@
                     <ul class="nav">
                         <li><a href="index.html">Overview</a></li>
 
+                        <li><a href="https://github.com/apache/incubator-systemml">GitHub</a></li>
+
                         <li class="dropdown">
                             <a href="#" class="dropdown-toggle" data-toggle="dropdown">Documentation<b class="caret"></b></a>
                             <ul class="dropdown-menu">
-                            
+                                <li>&nbsp;&nbsp;<b>Running SystemML:</b></li>
                                 <li><a href="https://github.com/apache/incubator-systemml">SystemML GitHub README</a></li>
                                 <li><a href="quick-start-guide.html">Quick Start Guide</a></li>
-                                <li><a href="dml-and-pydml-programming-guide.html">DML and PyDML Programming Guide</a></li>
-                                <li><a href="mlcontext-programming-guide.html">MLContext Programming Guide</a></li>
                                 <li><a href="hadoop-batch-mode.html">Hadoop Batch Mode</a>
-                                <li><a href="debugger-guide.html">Debugger Guide</a></li>
-                                <li><a href="algorithms-reference.html">Algorithms Reference</a></li>
+                                <li><a href="spark-mlcontext-programming-guide.html">Spark MLContext Programming Guide</a></li>
+                                <li class="divider"></li>
+                                <li>&nbsp;&nbsp;<b>Language Guides:</b></li>
                                 <li><a href="dml-language-reference.html">DML Language Reference</a></li>
+                                <li><a href="beginners-guide-to-dml-and-pydml.html">Beginner's Guide to DML and PyDML</a></li>
+                                <li class="divider"></li>
+                                <li>&nbsp;&nbsp;<b>ML Algorithms:</b></li>
+                                <li><a href="algorithms-reference.html">Algorithms Reference</a></li>
                                 <li class="divider"></li>
-                                <li>&nbsp;&nbsp;&nbsp;&nbsp; More Coming Soon...</li>
+                                <li>&nbsp;&nbsp;<b>Tools:</b></li>
+                                <li><a href="debugger-guide.html">Debugger Guide</a></li>
                             </ul>
                         </li>
 
@@ -69,6 +75,21 @@
         <script src="js/vendor/anchor.min.js"></script>
         <script src="js/main.js"></script>
 
+{% if site.analytics_on == true %}
+{% case site.analytics_provider %}
+{% when "google_universal" %}
+        <!-- Analytics -->
+        <script>
+            (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+            (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+            m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+            })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+            ga('create', '{{ site.analytics_google_universal_tracking_id }}', 'auto');
+            ga('send', 'pageview');
+        </script>
+{% endcase %}
+{% endif %}
+
         <!-- MathJax Section -->
         <script type="text/x-mathjax-config">
             MathJax.Hub.Config({

http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/65844aa6/docs/beginners-guide-to-dml-and-pydml.md
----------------------------------------------------------------------
diff --git a/docs/beginners-guide-to-dml-and-pydml.md b/docs/beginners-guide-to-dml-and-pydml.md
new file mode 100644
index 0000000..611024d
--- /dev/null
+++ b/docs/beginners-guide-to-dml-and-pydml.md
@@ -0,0 +1,889 @@
+---
+layout: global
+title: Beginner's Guide to DML and PyDML
+description: Beginner's Guide to DML and PyDML
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+<br/>
+
+
+# Overview
+
+SystemML enables *flexible*, scalable machine learning. This flexibility is achieved
+through the specification of a high-level declarative machine learning language
+that comes in two flavors, one with an R-like syntax (DML) and one with
+a Python-like syntax (PyDML).
+
+Algorithm scripts written in DML and PyDML can be run on Hadoop, on Spark, or
+in Standalone mode. No script modifications are required to change between modes.
+SystemML automatically performs advanced
+optimizations based on data and cluster characteristics, so much of the need to manually
+tweak algorithms is largely reduced or eliminated.
+
+This Beginner's Guide serves as a starting point for writing DML and PyDML
+scripts.
+
+
+# Script Invocation
+
+DML and PyDML scripts can be invoked in a variety of ways. Suppose that we have `hello.dml` and
+`hello.pydml` scripts containing the following:
+
+	print('hello ' + $1)
+
+One way to begin working with SystemML is to build the project and unpack the standalone distribution,
+which features the `runStandaloneSystemML.sh` and `runStandaloneSystemML.bat` scripts. The name of the DML or PyDML script
+is passed as the first argument to these scripts, along with a variety of arguments.
+
+	./runStandaloneSystemML.sh hello.dml -args world
+	./runStandaloneSystemML.sh hello.pydml -python -args world
+
+For DML and PyDML script invocations that take multiple arguments, a common technique is to create
+a standard script that invokes `runStandaloneSystemML.sh` or `runStandaloneSystemML.bat` with the arguments specified.
+
+SystemML itself is written in Java and is managed using Maven. As a result, SystemML can readily be
+imported into a standard development environment such as Eclipse.
+The `DMLScript` class serves as the main entrypoint to SystemML. Executing
+`DMLScript` with no arguments displays usage information. A script file can be specified using the `-f` argument.
+
+In Eclipse, a Debug Configuration can be created with `DMLScript` as the Main class and any arguments specified as
+Program arguments. A PyDML script requires the addition of a `-python` switch.
+
+<div class="codetabs2">
+
+<div data-lang="Eclipse Debug Configuration - Main" markdown="1">
+![Eclipse Debug Configuration - Main](img/beginners-guide-to-dml-and-pydml/dmlscript-debug-configuration-hello-world-main-class.png "DMLScript Debug Configuration, Main class")
+</div>
+
+<div data-lang="Eclipse Debug Configuration - Arguments" markdown="1">
+![Eclipse Debug Configuration - Arguments](img/beginners-guide-to-dml-and-pydml/dmlscript-debug-configuration-hello-world-program-arguments.png "DMLScript Debug Configuration, Program arguments")
+</div>
+
+</div>
+
+SystemML contains a default set of configuration information. In addition to this, SystemML looks for a default `./SystemML-config.xml` file in the working directory, where overriding configuration information can be specified. Furthermore, a config file can be specified using the `-config` argument, as in this example:
+
+	-f hello.dml -config=src/main/standalone/SystemML-config.xml -args world
+
+When operating in a distributed environment, it is *highly recommended* that cluster-specific configuration information
+is provided to SystemML via a configuration file for optimal performance.
+
+
+# Data Types
+
+SystemML has four value data types. In DML, these are: **double**, **integer**,
+**string**, and **boolean**. In PyDML, these are: **float**, **int**,
+**str**, and **bool**. In normal usage, the data type of a variable is implicit
+based on its value. Mathematical operations typically operate on
+doubles/floats, whereas integers/ints are typically useful for tasks such as
+iteration and accessing elements in a matrix.
+
+<div class="codetabs2">
+
+<div data-lang="DML" markdown="1">
+{% highlight r %}
+aDouble = 3.0
+bInteger = 2
+print('aDouble = ' + aDouble)
+print('bInteger = ' + bInteger)
+print('aDouble + bInteger = ' + (aDouble + bInteger))
+print('bInteger ^ 3 = ' + (bInteger ^ 3))
+print('aDouble ^ 2 = ' + (aDouble ^ 2))
+
+cBoolean = TRUE
+print('cBoolean = ' + cBoolean)
+print('(2 < 1) = ' + (2 < 1))
+
+dString = 'Hello'
+eString = dString + ' World'
+print('dString = ' + dString)
+print('eString = ' + eString)
+{% endhighlight %}
+</div>
+
+<div data-lang="PyDML" markdown="1">
+{% highlight python %}
+aFloat = 3.0
+bInt = 2
+print('aFloat = ' + aFloat)
+print('bInt = ' + bInt)
+print('aFloat + bInt = ' + (aFloat + bInt))
+print('bInt ** 3 = ' + (bInt ** 3))
+print('aFloat ** 2 = ' + (aFloat ** 2))
+
+cBool = True
+print('cBool = ' + cBool)
+print('(2 < 1) = ' + (2 < 1))
+
+dStr = 'Hello'
+eStr = dStr + ' World'
+print('dStr = ' + dStr)
+print('eStr = ' + eStr)
+{% endhighlight %}
+</div>
+
+<div data-lang="DML Result" markdown="1">
+	aDouble = 3.0
+	bInteger = 2
+	aDouble + bInteger = 5.0
+	bInteger ^ 3 = 8.0
+	aDouble ^ 2 = 9.0
+	cBoolean = TRUE
+	(2 < 1) = FALSE
+	dString = Hello
+	eString = Hello World
+</div>
+
+<div data-lang="PyDML Result" markdown="1">
+	aFloat = 3.0
+	bInt = 2
+	aFloat + bInt = 5.0
+	bInt ** 3 = 8.0
+	aFloat ** 2 = 9.0
+	cBool = TRUE
+	(2 < 1) = FALSE
+	dStr = Hello
+	eStr = Hello World
+</div>
+
+</div>
+
+
+# Matrix Basics
+
+## Creating a Matrix
+
+A matrix can be created in DML using the **`matrix()`** function and in PyDML using the **`full()`** 
+function. In the example below, a matrix element is still considered to be of the matrix data type,
+so the value is cast to a scalar in order to print it. Matrix element values are of type **double**/**float**.
+*Note that matrices index from 1 in both DML and PyDML.*
+
+<div class="codetabs2">
+
+<div data-lang="DML" markdown="1">
+{% highlight r %}
+m = matrix("1 2 3 4 5 6 7 8 9 10 11 12", rows=4, cols=3)
+for (i in 1:nrow(m)) {
+    for (j in 1:ncol(m)) {
+        n = m[i,j]
+        print('[' + i + ',' + j + ']:' + as.scalar(n))
+    }
+}
+{% endhighlight %}
+</div>
+
+<div data-lang="PyDML" markdown="1">
+{% highlight python %}
+m = full("1 2 3 4 5 6 7 8 9 10 11 12", rows=4, cols=3)
+for (i in 1:nrow(m)):
+    for (j in 1:ncol(m)):
+        n = m[i,j]
+        print('[' + i + ',' + j + ']:' + scalar(n))
+{% endhighlight %}
+</div>
+
+<div data-lang="Result" markdown="1">
+	[1,1]:1.0
+	[1,2]:2.0
+	[1,3]:3.0
+	[2,1]:4.0
+	[2,2]:5.0
+	[2,3]:6.0
+	[3,1]:7.0
+	[3,2]:8.0
+	[3,3]:9.0
+	[4,1]:10.0
+	[4,2]:11.0
+	[4,3]:12.0
+</div>
+
+</div>
+
+For additional information about the **`matrix()`** and **`full()`** functions, please see the 
+DML Language Reference ([Matrix Construction](dml-language-reference.html#matrix-construction-manipulation-and-aggregation-built-in-functions)) and the 
+PyDML Language Reference (Matrix Construction).
+
+
+## Saving a Matrix
+
+A matrix can be saved using the **`write()`** function in DML and the **`save()`** function in PyDML. SystemML supports four
+different formats: **`text`** (`i,j,v`), **`mm`** (`Matrix Market`), **`csv`** (`delimiter-separated values`), and **`binary`**.
+
+<div class="codetabs2">
+
+<div data-lang="DML" markdown="1">
+{% highlight r %}
+m = matrix("1 2 3 0 0 0 7 8 9 0 0 0", rows=4, cols=3)
+write(m, "m.txt", format="text")
+write(m, "m.mm", format="mm")
+write(m, "m.csv", format="csv")
+write(m, "m.binary", format="binary")
+{% endhighlight %}
+</div>
+
+<div data-lang="PyDML" markdown="1">
+{% highlight python %}
+m = full("1 2 3 0 0 0 7 8 9 0 0 0", rows=4, cols=3)
+save(m, "m.txt", format="text")
+save(m, "m.mm", format="mm")
+save(m, "m.csv", format="csv")
+save(m, "m.binary", format="binary")
+{% endhighlight %}
+</div>
+
+</div>
+
+Saving a matrix automatically creates a metadata file for each format except for Matrix Market, since Matrix Market contains
+metadata within the *.mm file. All formats are text-based except binary. The contents of the resulting files are shown here.
+
+<div class="codetabs2">
+
+<div data-lang="m.txt" markdown="1">
+	1 1 1.0
+	1 2 2.0
+	1 3 3.0
+	3 1 7.0
+	3 2 8.0
+	3 3 9.0
+</div>
+
+<div data-lang="m.txt.mtd" markdown="1">
+	{ 
+	    "data_type": "matrix"
+	    ,"value_type": "double"
+	    ,"rows": 4
+	    ,"cols": 3
+	    ,"nnz": 6
+	    ,"format": "text"
+	    ,"description": { "author": "SystemML" } 
+	}
+</div>
+
+<div data-lang="m.mm" markdown="1">
+	%%MatrixMarket matrix coordinate real general
+	4 3 6
+	1 1 1.0
+	1 2 2.0
+	1 3 3.0
+	3 1 7.0
+	3 2 8.0
+	3 3 9.0
+</div>
+
+<div data-lang="m.csv" markdown="1">
+	1.0,2.0,3.0
+	0,0,0
+	7.0,8.0,9.0
+	0,0,0
+</div>
+
+<div data-lang="m.csv.mtd" markdown="1">
+	{ 
+	    "data_type": "matrix"
+	    ,"value_type": "double"
+	    ,"rows": 4
+	    ,"cols": 3
+	    ,"nnz": 6
+	    ,"format": "csv"
+	    ,"header": false
+	    ,"sep": ","
+	    ,"description": { "author": "SystemML" } 
+	}
+</div>
+
+<div data-lang="m.binary" markdown="1">
+	Not text-based
+</div>
+
+<div data-lang="m.binary.mtd" markdown="1">
+	{ 
+	    "data_type": "matrix"
+	    ,"value_type": "double"
+	    ,"rows": 4
+	    ,"cols": 3
+	    ,"rows_in_block": 1000
+	    ,"cols_in_block": 1000
+	    ,"nnz": 6
+	    ,"format": "binary"
+	    ,"description": { "author": "SystemML" } 
+	}
+</div>
+
+</div>
+
+
+## Loading a Matrix
+
+A matrix can be loaded using the **`read()`** function in DML and the **`load()`** function in PyDML. As with saving, SystemML supports four
+formats: **`text`** (`i,j,v`), **`mm`** (`Matrix Market`), **`csv`** (`delimiter-separated values`), and **`binary`**. To read a file, a corresponding
+metadata file is required, except for the Matrix Market format.
+
+<div class="codetabs2">
+
+<div data-lang="DML" markdown="1">
+{% highlight r %}
+m = read("m.txt")
+print("min:" + min(m))
+print("max:" + max(m))
+print("sum:" + sum(m))
+mRowSums = rowSums(m)
+for (i in 1:nrow(mRowSums)) {
+    print("row " + i + " sum:" + as.scalar(mRowSums[i,1]))
+}
+mColSums = colSums(m)
+for (i in 1:ncol(mColSums)) {
+    print("col " + i + " sum:" + as.scalar(mColSums[1,i]))
+}
+{% endhighlight %}
+</div>
+
+<div data-lang="PyDML" markdown="1">
+{% highlight python %}
+m = load("m.txt")
+print("min:" + min(m))
+print("max:" + max(m))
+print("sum:" + sum(m))
+mRowSums = rowSums(m)
+for (i in 1:nrow(mRowSums)):
+    print("row " + i + " sum:" + scalar(mRowSums[i,1]))
+mColSums = colSums(m)
+for (i in 1:ncol(mColSums)):
+    print("col " + i + " sum:" + scalar(mColSums[1,i]))
+{% endhighlight %}
+</div>
+
+<div data-lang="Result" markdown="1">
+	min:0.0
+	max:9.0
+	sum:30.0
+	row 1 sum:6.0
+	row 2 sum:0.0
+	row 3 sum:24.0
+	row 4 sum:0.0
+	col 1 sum:8.0
+	col 2 sum:10.0
+	col 3 sum:12.0
+</div>
+
+</div>
+
+
+## Matrix Operations
+
+DML and PyDML offer a rich set of operators and built-in functions to perform various operations on matrices and scalars.
+Operators and built-in functions are described in great detail in the DML Language Reference 
+([Expressions](dml-language-reference.html#expressions), [Built-In Functions](dml-language-reference.html#built-in-functions))
+and the PyDML Language Reference
+(Expressions, Built-In Functions).
+
+In this example, we create a matrix A. Next, we create another matrix B by adding 4 to each element in A. Next, we flip
+B by taking its transpose. We then multiply A and B, represented by matrix C. We create a matrix D with the same number
+of rows and columns as C, and initialize its elements to 5. We then subtract D from C and divide the values of its elements
+by 2 and assign the resulting matrix to D.
+
+This example also shows a user-defined function called `printMatrix()`, which takes a string and matrix as arguments and returns
+nothing.
+
+<div class="codetabs2">
+
+<div data-lang="DML" markdown="1">
+{% highlight r %}
+printMatrix = function(string which, matrix[double] mat) {
+    print(which)
+    for (i in 1:nrow(mat)) {
+        colVals = '| '
+        for (j in 1:ncol(mat)) {
+            n = mat[i,j]
+            colVals = colVals + as.scalar(n) + ' | '
+        }
+        print(colVals)
+    }
+}
+
+A = matrix("1 2 3 4 5 6", rows=3, cols=2)
+z = printMatrix('Matrix A:', A)
+B = A + 4
+B = t(B)
+z = printMatrix('Matrix B:', B)
+C = A %*% B
+z = printMatrix('Matrix C:', C)
+D = matrix(5, rows=nrow(C), cols=ncol(C))
+D = (C - D) / 2
+z = printMatrix('Matrix D:', D)
+
+{% endhighlight %}
+</div>
+
+<div data-lang="PyDML" markdown="1">
+{% highlight python %}
+def printMatrix(which: str, mat: matrix[float]):
+    print(which)
+    for (i in 1:nrow(mat)):
+        colVals = '| '
+        for (j in 1:ncol(mat)):
+            n = mat[i,j]
+            colVals = colVals + scalar(n) + ' | '
+        print(colVals)
+
+A = full("1 2 3 4 5 6", rows=3, cols=2)
+z = printMatrix('Matrix A:', A)
+B = A + 4
+B = transpose(B)
+z = printMatrix('Matrix B:', B)
+C = dot(A, B)
+z = printMatrix('Matrix C:', C)
+D = full(5, rows=nrow(C), cols=ncol(C))
+D = (C - D) / 2
+z = printMatrix('Matrix D:', D)
+
+{% endhighlight %}
+</div>
+
+<div data-lang="Result" markdown="1">
+	Matrix A:
+	| 1.0 | 2.0 | 
+	| 3.0 | 4.0 | 
+	| 5.0 | 6.0 | 
+	Matrix B:
+	| 5.0 | 7.0 | 9.0 | 
+	| 6.0 | 8.0 | 10.0 | 
+	Matrix C:
+	| 17.0 | 23.0 | 29.0 | 
+	| 39.0 | 53.0 | 67.0 | 
+	| 61.0 | 83.0 | 105.0 | 
+	Matrix D:
+	| 6.0 | 9.0 | 12.0 | 
+	| 17.0 | 24.0 | 31.0 | 
+	| 28.0 | 39.0 | 50.0 | 
+</div>
+
+</div>
+
+
+## Matrix Indexing
+
+The elements in a matrix can be accessed by their row and column indices. In the example below, we have 3x3 matrix A.
+First, we access the element at the third row and third column. Next, we obtain a row slice (vector) of the matrix by
+specifying row 2 and leaving the column blank. We obtain a column slice (vector) by leaving the row blank and specifying
+column 3. After that, we obtain a submatrix via range indexing, where we specify rows 2 to 3, separated by a colon, and columns
+1 to 2, separated by a colon.
+
+<div class="codetabs2">
+
+<div data-lang="DML" markdown="1">
+{% highlight r %}
+printMatrix = function(string which, matrix[double] mat) {
+    print(which)
+    for (i in 1:nrow(mat)) {
+        colVals = '| '
+        for (j in 1:ncol(mat)) {
+            n = mat[i,j]
+            colVals = colVals + as.scalar(n) + ' | '
+        }
+        print(colVals)
+    }
+}
+
+A = matrix("1 2 3 4 5 6 7 8 9", rows=3, cols=3)
+z = printMatrix('Matrix A:', A)
+B = A[3,3]
+z = printMatrix('Matrix B:', B)
+C = A[2,]
+z = printMatrix('Matrix C:', C)
+D = A[,3]
+z = printMatrix('Matrix D:', D)
+E = A[2:3,1:2]
+z = printMatrix('Matrix E:', E)
+
+{% endhighlight %}
+</div>
+
+<div data-lang="PyDML" markdown="1">
+{% highlight python %}
+def printMatrix(which: str, mat: matrix[float]):
+    print(which)
+    for (i in 1:nrow(mat)):
+        colVals = '| '
+        for (j in 1:ncol(mat)):
+            n = mat[i,j]
+            colVals = colVals + scalar(n) + ' | '
+        print(colVals)
+
+A = full("1 2 3 4 5 6 7 8 9", rows=3, cols=3)
+z = printMatrix('Matrix A:', A)
+B = A[3,3]
+z = printMatrix('Matrix B:', B)
+C = A[2,]
+z = printMatrix('Matrix C:', C)
+D = A[,3]
+z = printMatrix('Matrix D:', D)
+E = A[2:3,1:2]
+z = printMatrix('Matrix E:', E)
+
+{% endhighlight %}
+</div>
+
+<div data-lang="Result" markdown="1">
+	Matrix A:
+	| 1.0 | 2.0 | 3.0 | 
+	| 4.0 | 5.0 | 6.0 | 
+	| 7.0 | 8.0 | 9.0 | 
+	Matrix B:
+	| 9.0 | 
+	Matrix C:
+	| 4.0 | 5.0 | 6.0 | 
+	Matrix D:
+	| 3.0 | 
+	| 6.0 | 
+	| 9.0 | 
+	Matrix E:
+	| 4.0 | 5.0 | 
+	| 7.0 | 8.0 | 
+</div>
+
+</div>
+
+
+# Control Statements
+
+DML and PyDML both feature `if` and `if-else` conditional statements. In addition, DML features `else-if` which avoids the
+need for nested conditional statements.
+
+DML and PyDML feature 3 loop statements: `while`, `for`, and `parfor` (parallel for). In the example, note that the 
+`print` statements within the `parfor` loop can occur in any order since the iterations occur in parallel rather than
+sequentially as in a regular `for` loop. The `parfor` statement can include several optional parameters, as described
+in the DML Language Reference ([ParFor Statement](dml-language-reference.html#parfor-statement)) and PyDML Language Reference (ParFor Statement).
+
+<div class="codetabs2">
+
+<div data-lang="DML" markdown="1">
+{% highlight r %}
+i = 1
+while (i < 3) {
+    if (i == 1) {
+        print('hello')
+    } else {
+        print('world')
+    }
+    i = i + 1
+}
+
+A = matrix("1 2 3 4 5 6", rows=3, cols=2)
+
+for (i in 1:nrow(A)) {
+    print("for A[" + i + ",1]:" + as.scalar(A[i,1]))
+}
+
+parfor(i in 1:nrow(A)) {
+    print("parfor A[" + i + ",1]:" + as.scalar(A[i,1]))
+}
+{% endhighlight %}
+</div>
+
+<div data-lang="PyDML" markdown="1">
+{% highlight python %}
+i = 1
+while (i < 3):
+    if (i == 1):
+        print('hello')
+    else:
+        print('world')
+    i = i + 1
+
+A = full("1 2 3 4 5 6", rows=3, cols=2)
+
+for (i in 1:nrow(A)):
+    print("for A[" + i + ",1]:" + scalar(A[i,1]))
+
+parfor(i in 1:nrow(A)):
+    print("parfor A[" + i + ",1]:" + scalar(A[i,1]))
+{% endhighlight %}
+</div>
+
+<div data-lang="Result" markdown="1">
+	hello
+	world
+	for A[1,1]:1.0
+	for A[2,1]:3.0
+	for A[3,1]:5.0
+	parfor A[2,1]:3.0
+	parfor A[1,1]:1.0
+	parfor A[3,1]:5.0
+</div>
+
+</div>
+
+
+# User-Defined Functions
+
+Functions encapsulate useful functionality in SystemML. In addition to built-in functions, users can define their own functions.
+Functions take 0 or more parameters and return 0 or more values.
+Currently, if a function returns nothing, it still needs to be assigned to a variable.
+
+<div class="codetabs2">
+
+<div data-lang="DML" markdown="1">
+{% highlight r %}
+doSomething = function(matrix[double] mat) return (matrix[double] ret) {
+    additionalCol = matrix(1, rows=nrow(mat), cols=1) # 1x3 matrix with 1 values
+    ret = cbind(mat, additionalCol) # concatenate column to matrix
+    ret = cbind(ret, seq(0, 2, 1))  # concatenate column (0,1,2) to matrix
+    ret = cbind(ret, rowMaxs(ret))  # concatenate column of max row values to matrix
+    ret = cbind(ret, rowSums(ret))  # concatenate column of row sums to matrix
+}
+
+A = rand(rows=3, cols=2, min=0, max=2) # random 3x2 matrix with values 0 to 2
+B = doSomething(A)
+write(A, "A.csv", format="csv")
+write(B, "B.csv", format="csv")
+{% endhighlight %}
+</div>
+
+<div data-lang="PyDML" markdown="1">
+{% highlight python %}
+def doSomething(mat: matrix[float]) -> (ret: matrix[float]):
+    additionalCol = full(1, rows=nrow(mat), cols=1) # 1x3 matrix with 1 values
+    ret = cbind(mat, additionalCol) # concatenate column to matrix
+    ret = cbind(ret, seq(0, 2, 1))  # concatenate column (0,1,2) to matrix
+    ret = cbind(ret, rowMaxs(ret))  # concatenate column of max row values to matrix
+    ret = cbind(ret, rowSums(ret))  # concatenate column of row sums to matrix
+
+A = rand(rows=3, cols=2, min=0, max=2) # random 3x2 matrix with values 0 to 2
+B = doSomething(A)
+save(A, "A.csv", format="csv")
+save(B, "B.csv", format="csv")
+{% endhighlight %}
+</div>
+
+</div>
+
+In the above example, a 3x2 matrix of random doubles between 0 and 2 is created using the **`rand()`** function.
+Additional parameters can be passed to **`rand()`** to control sparsity and other matrix characteristics.
+
+Matrix A is passed to the `doSomething` function. A column of 1 values is concatenated to the matrix. A column
+consisting of the values `(0, 1, 2)` is concatenated to the matrix. Next, a column consisting of the maximum row values
+is concatenated to the matrix. A column consisting of the row sums is concatenated to the matrix, and this resulting
+matrix is returned to variable B. Matrix A is output to the `A.csv` file and matrix B is saved as the `B.csv` file.
+
+
+<div class="codetabs2">
+
+<div data-lang="A.csv" markdown="1">
+	1.6091961493071,0.7088614208099939
+	0.5984862383600267,1.5732118950764993
+	0.2947607068519842,1.9081406573366781
+</div>
+
+<div data-lang="B.csv" markdown="1">
+	1.6091961493071,0.7088614208099939,1.0,0,1.6091961493071,4.927253719424194
+	0.5984862383600267,1.5732118950764993,1.0,1.0,1.5732118950764993,5.744910028513026
+	0.2947607068519842,1.9081406573366781,1.0,2.0,2.0,7.202901364188662
+</div>
+
+</div>
+
+
+# Command-Line Arguments and Default Values
+
+Command-line arguments can be passed to DML and PyDML scripts either as named arguments or as positional arguments. Named
+arguments are the preferred technique. Named arguments can be passed utilizing the `-nvargs` switch, and positional arguments
+can be passed using the `-args` switch.
+
+Default values can be set using the **`ifdef()`** function.
+
+In the example below, a matrix is read from the file system using named argument `M`. The number of rows to print is specified
+using the `rowsToPrint` argument, which defaults to 2 if no argument is supplied. Likewise, the number of columns is
+specified using `colsToPrint` with a default value of 2.
+
+<div class="codetabs2">
+
+<div data-lang="DML" markdown="1">
+{% highlight r %}
+
+fileM = $M
+
+numRowsToPrint = ifdef($rowsToPrint, 2) # default to 2
+numColsToPrint = ifdef($colsToPrint, 2) # default to 2
+
+m = read(fileM)
+
+for (i in 1:numRowsToPrint) {
+    for (j in 1:numColsToPrint) {
+        print('[' + i + ',' + j + ']:' + as.scalar(m[i,j]))
+    }
+}
+
+{% endhighlight %}
+</div>
+
+<div data-lang="PyDML" markdown="1">
+{% highlight python %}
+
+fileM = $M
+
+numRowsToPrint = ifdef($rowsToPrint, 2) # default to 2
+numColsToPrint = ifdef($colsToPrint, 2) # default to 2
+
+m = load(fileM)
+
+for (i in 1:numRowsToPrint):
+    for (j in 1:numColsToPrint):
+        print('[' + i + ',' + j + ']:' + scalar(m[i,j]))
+
+{% endhighlight %}
+</div>
+
+<div data-lang="DML Named Arguments and Results" markdown="1">
+	Example #1 Arguments:
+	-f ex.dml -nvargs M=M.txt rowsToPrint=1 colsToPrint=3
+	
+	Example #1 Results:
+	[1,1]:1.0
+	[1,2]:2.0
+	[1,3]:3.0
+	
+	Example #2 Arguments:
+	-f ex.dml -nvargs M=M.txt
+	
+	Example #2 Results:
+	[1,1]:1.0
+	[1,2]:2.0
+	[2,1]:0.0
+	[2,2]:0.0
+	
+</div>
+
+<div data-lang="PyDML Named Arguments and Results" markdown="1">
+	Example #1 Arguments:
+	-f ex.pydml -python -nvargs M=M.txt rowsToPrint=1 colsToPrint=3
+	
+	Example #1 Results:
+	[1,1]:1.0
+	[1,2]:2.0
+	[1,3]:3.0
+	
+	Example #2 Arguments:
+	-f ex.pydml -python -nvargs M=M.txt
+	
+	Example #2 Results:
+	[1,1]:1.0
+	[1,2]:2.0
+	[2,1]:0.0
+	[2,2]:0.0
+	
+</div>
+
+</div>
+
+Here, we see identical functionality but with positional arguments.
+
+<div class="codetabs2">
+
+<div data-lang="DML" markdown="1">
+{% highlight r %}
+
+fileM = $1
+
+numRowsToPrint = ifdef($2, 2) # default to 2
+numColsToPrint = ifdef($3, 2) # default to 2
+
+m = read(fileM)
+
+for (i in 1:numRowsToPrint) {
+    for (j in 1:numColsToPrint) {
+        print('[' + i + ',' + j + ']:' + as.scalar(m[i,j]))
+    }
+}
+
+{% endhighlight %}
+</div>
+
+<div data-lang="PyDML" markdown="1">
+{% highlight python %}
+
+fileM = $1
+
+numRowsToPrint = ifdef($2, 2) # default to 2
+numColsToPrint = ifdef($3, 2) # default to 2
+
+m = load(fileM)
+
+for (i in 1:numRowsToPrint):
+    for (j in 1:numColsToPrint):
+        print('[' + i + ',' + j + ']:' + scalar(m[i,j]))
+
+{% endhighlight %}
+</div>
+
+<div data-lang="DML Positional Arguments and Results" markdown="1">
+	Example #1 Arguments:
+	-f ex.dml -args M.txt 1 3
+	
+	Example #1 Results:
+	[1,1]:1.0
+	[1,2]:2.0
+	[1,3]:3.0
+	
+	Example #2 Arguments:
+	-f ex.dml -args M.txt
+	
+	Example #2 Results:
+	[1,1]:1.0
+	[1,2]:2.0
+	[2,1]:0.0
+	[2,2]:0.0
+	
+</div>
+
+<div data-lang="PyDML Positional Arguments and Results" markdown="1">
+	Example #1 Arguments:
+	-f ex.pydml -python -args M.txt 1 3
+	
+	Example #1 Results:
+	[1,1]:1.0
+	[1,2]:2.0
+	[1,3]:3.0
+	
+	Example #2 Arguments:
+	-f ex.pydml -python -args M.txt
+	
+	Example #2 Results:
+	[1,1]:1.0
+	[1,2]:2.0
+	[2,1]:0.0
+	[2,2]:0.0
+	
+</div>
+
+</div>
+
+
+# Additional Information
+
+The [DML Language Reference](dml-language-reference.html) and PyDML Language Reference contain highly detailed information regard DML 
+and PyDML.
+
+In addition, many excellent examples of DML and PyDML can be found in the `system-ml/scripts` and 
+`system-ml/test/scripts/applications` directories.
+

http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/65844aa6/docs/dml-and-pydml-programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/dml-and-pydml-programming-guide.md b/docs/dml-and-pydml-programming-guide.md
deleted file mode 100644
index 2745d3e..0000000
--- a/docs/dml-and-pydml-programming-guide.md
+++ /dev/null
@@ -1,889 +0,0 @@
----
-layout: global
-title: DML and PyDML Programming Guide
-description: DML and PyDML Programming Guide
----
-<!--
-{% comment %}
-Licensed to the Apache Software Foundation (ASF) under one or more
-contributor license agreements.  See the NOTICE file distributed with
-this work for additional information regarding copyright ownership.
-The ASF licenses this file to you under the Apache License, Version 2.0
-(the "License"); you may not use this file except in compliance with
-the License.  You may obtain a copy of the License at
-
-http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
-{% endcomment %}
--->
-
-* This will become a table of contents (this text will be scraped).
-{:toc}
-
-<br/>
-
-
-# Overview
-
-SystemML enables *flexible*, scalable machine learning. This flexibility is achieved
-through the specification of a high-level declarative machine learning language
-that comes in two flavors, one with an R-like syntax (DML) and one with
-a Python-like syntax (PyDML).
-
-Algorithm scripts written in DML and PyDML can be run on Hadoop, on Spark, or
-in Standalone mode. No script modifications are required to change between modes.
-SystemML automatically performs advanced
-optimizations based on data and cluster characteristics, so much of the need to manually
-tweak algorithms is largely reduced or eliminated.
-
-This SystemML Programming Guide serves as a starting point for writing DML and PyDML 
-scripts.
-
-
-# Script Invocation
-
-DML and PyDML scripts can be invoked in a variety of ways. Suppose that we have `hello.dml` and
-`hello.pydml` scripts containing the following:
-
-	print('hello ' + $1)
-
-One way to begin working with SystemML is to build the project and unpack the standalone distribution,
-which features the `runStandaloneSystemML.sh` and `runStandaloneSystemML.bat` scripts. The name of the DML or PyDML script
-is passed as the first argument to these scripts, along with a variety of arguments.
-
-	./runStandaloneSystemML.sh hello.dml -args world
-	./runStandaloneSystemML.sh hello.pydml -python -args world
-
-For DML and PyDML script invocations that take multiple arguments, a common technique is to create
-a standard script that invokes `runStandaloneSystemML.sh` or `runStandaloneSystemML.bat` with the arguments specified.
-
-SystemML itself is written in Java and is managed using Maven. As a result, SystemML can readily be
-imported into a standard development environment such as Eclipse.
-The `DMLScript` class serves as the main entrypoint to SystemML. Executing
-`DMLScript` with no arguments displays usage information. A script file can be specified using the `-f` argument.
-
-In Eclipse, a Debug Configuration can be created with `DMLScript` as the Main class and any arguments specified as
-Program arguments. A PyDML script requires the addition of a `-python` switch.
-
-<div class="codetabs2">
-
-<div data-lang="Eclipse Debug Configuration - Main" markdown="1">
-![Eclipse Debug Configuration - Main](img/dml-and-pydml-programming-guide/dmlscript-debug-configuration-hello-world-main-class.png "DMLScript Debug Configuration, Main class")
-</div>
-
-<div data-lang="Eclipse Debug Configuration - Arguments" markdown="1">
-![Eclipse Debug Configuration - Arguments](img/dml-and-pydml-programming-guide/dmlscript-debug-configuration-hello-world-program-arguments.png "DMLScript Debug Configuration, Program arguments")
-</div>
-
-</div>
-
-SystemML contains a default set of configuration information. In addition to this, SystemML looks for a default `./SystemML-config.xml` file in the working directory, where overriding configuration information can be specified. Furthermore, a config file can be specified using the `-config` argument, as in this example:
-
-	-f hello.dml -config=src/main/standalone/SystemML-config.xml -args world
-
-When operating in a distributed environment, it is *highly recommended* that cluster-specific configuration information
-is provided to SystemML via a configuration file for optimal performance.
-
-
-# Data Types
-
-SystemML has four value data types. In DML, these are: **double**, **integer**,
-**string**, and **boolean**. In PyDML, these are: **float**, **int**,
-**str**, and **bool**. In normal usage, the data type of a variable is implicit
-based on its value. Mathematical operations typically operate on
-doubles/floats, whereas integers/ints are typically useful for tasks such as
-iteration and accessing elements in a matrix.
-
-<div class="codetabs2">
-
-<div data-lang="DML" markdown="1">
-{% highlight r %}
-aDouble = 3.0
-bInteger = 2
-print('aDouble = ' + aDouble)
-print('bInteger = ' + bInteger)
-print('aDouble + bInteger = ' + (aDouble + bInteger))
-print('bInteger ^ 3 = ' + (bInteger ^ 3))
-print('aDouble ^ 2 = ' + (aDouble ^ 2))
-
-cBoolean = TRUE
-print('cBoolean = ' + cBoolean)
-print('(2 < 1) = ' + (2 < 1))
-
-dString = 'Hello'
-eString = dString + ' World'
-print('dString = ' + dString)
-print('eString = ' + eString)
-{% endhighlight %}
-</div>
-
-<div data-lang="PyDML" markdown="1">
-{% highlight python %}
-aFloat = 3.0
-bInt = 2
-print('aFloat = ' + aFloat)
-print('bInt = ' + bInt)
-print('aFloat + bInt = ' + (aFloat + bInt))
-print('bInt ** 3 = ' + (bInt ** 3))
-print('aFloat ** 2 = ' + (aFloat ** 2))
-
-cBool = True
-print('cBool = ' + cBool)
-print('(2 < 1) = ' + (2 < 1))
-
-dStr = 'Hello'
-eStr = dStr + ' World'
-print('dStr = ' + dStr)
-print('eStr = ' + eStr)
-{% endhighlight %}
-</div>
-
-<div data-lang="DML Result" markdown="1">
-	aDouble = 3.0
-	bInteger = 2
-	aDouble + bInteger = 5.0
-	bInteger ^ 3 = 8.0
-	aDouble ^ 2 = 9.0
-	cBoolean = TRUE
-	(2 < 1) = FALSE
-	dString = Hello
-	eString = Hello World
-</div>
-
-<div data-lang="PyDML Result" markdown="1">
-	aFloat = 3.0
-	bInt = 2
-	aFloat + bInt = 5.0
-	bInt ** 3 = 8.0
-	aFloat ** 2 = 9.0
-	cBool = TRUE
-	(2 < 1) = FALSE
-	dStr = Hello
-	eStr = Hello World
-</div>
-
-</div>
-
-
-# Matrix Basics
-
-## Creating a Matrix
-
-A matrix can be created in DML using the **`matrix()`** function and in PyDML using the **`full()`** 
-function. In the example below, a matrix element is still considered to be of the matrix data type,
-so the value is cast to a scalar in order to print it. Matrix element values are of type **double**/**float**.
-*Note that matrices index from 1 in both DML and PyDML.*
-
-<div class="codetabs2">
-
-<div data-lang="DML" markdown="1">
-{% highlight r %}
-m = matrix("1 2 3 4 5 6 7 8 9 10 11 12", rows=4, cols=3)
-for (i in 1:nrow(m)) {
-    for (j in 1:ncol(m)) {
-        n = m[i,j]
-        print('[' + i + ',' + j + ']:' + as.scalar(n))
-    }
-}
-{% endhighlight %}
-</div>
-
-<div data-lang="PyDML" markdown="1">
-{% highlight python %}
-m = full("1 2 3 4 5 6 7 8 9 10 11 12", rows=4, cols=3)
-for (i in 1:nrow(m)):
-    for (j in 1:ncol(m)):
-        n = m[i,j]
-        print('[' + i + ',' + j + ']:' + scalar(n))
-{% endhighlight %}
-</div>
-
-<div data-lang="Result" markdown="1">
-	[1,1]:1.0
-	[1,2]:2.0
-	[1,3]:3.0
-	[2,1]:4.0
-	[2,2]:5.0
-	[2,3]:6.0
-	[3,1]:7.0
-	[3,2]:8.0
-	[3,3]:9.0
-	[4,1]:10.0
-	[4,2]:11.0
-	[4,3]:12.0
-</div>
-
-</div>
-
-For additional information about the **`matrix()`** and **`full()`** functions, please see the 
-DML Language Reference ([Matrix Construction](dml-language-reference.html#matrix-construction-manipulation-and-aggregation-built-in-functions)) and the 
-PyDML Language Reference (Matrix Construction).
-
-
-## Saving a Matrix
-
-A matrix can be saved using the **`write()`** function in DML and the **`save()`** function in PyDML. SystemML supports four
-different formats: **`text`** (`i,j,v`), **`mm`** (`Matrix Market`), **`csv`** (`delimiter-separated values`), and **`binary`**.
-
-<div class="codetabs2">
-
-<div data-lang="DML" markdown="1">
-{% highlight r %}
-m = matrix("1 2 3 0 0 0 7 8 9 0 0 0", rows=4, cols=3)
-write(m, "m.txt", format="text")
-write(m, "m.mm", format="mm")
-write(m, "m.csv", format="csv")
-write(m, "m.binary", format="binary")
-{% endhighlight %}
-</div>
-
-<div data-lang="PyDML" markdown="1">
-{% highlight python %}
-m = full("1 2 3 0 0 0 7 8 9 0 0 0", rows=4, cols=3)
-save(m, "m.txt", format="text")
-save(m, "m.mm", format="mm")
-save(m, "m.csv", format="csv")
-save(m, "m.binary", format="binary")
-{% endhighlight %}
-</div>
-
-</div>
-
-Saving a matrix automatically creates a metadata file for each format except for Matrix Market, since Matrix Market contains
-metadata within the *.mm file. All formats are text-based except binary. The contents of the resulting files are shown here.
-
-<div class="codetabs2">
-
-<div data-lang="m.txt" markdown="1">
-	1 1 1.0
-	1 2 2.0
-	1 3 3.0
-	3 1 7.0
-	3 2 8.0
-	3 3 9.0
-</div>
-
-<div data-lang="m.txt.mtd" markdown="1">
-	{ 
-	    "data_type": "matrix"
-	    ,"value_type": "double"
-	    ,"rows": 4
-	    ,"cols": 3
-	    ,"nnz": 6
-	    ,"format": "text"
-	    ,"description": { "author": "SystemML" } 
-	}
-</div>
-
-<div data-lang="m.mm" markdown="1">
-	%%MatrixMarket matrix coordinate real general
-	4 3 6
-	1 1 1.0
-	1 2 2.0
-	1 3 3.0
-	3 1 7.0
-	3 2 8.0
-	3 3 9.0
-</div>
-
-<div data-lang="m.csv" markdown="1">
-	1.0,2.0,3.0
-	0,0,0
-	7.0,8.0,9.0
-	0,0,0
-</div>
-
-<div data-lang="m.csv.mtd" markdown="1">
-	{ 
-	    "data_type": "matrix"
-	    ,"value_type": "double"
-	    ,"rows": 4
-	    ,"cols": 3
-	    ,"nnz": 6
-	    ,"format": "csv"
-	    ,"header": false
-	    ,"sep": ","
-	    ,"description": { "author": "SystemML" } 
-	}
-</div>
-
-<div data-lang="m.binary" markdown="1">
-	Not text-based
-</div>
-
-<div data-lang="m.binary.mtd" markdown="1">
-	{ 
-	    "data_type": "matrix"
-	    ,"value_type": "double"
-	    ,"rows": 4
-	    ,"cols": 3
-	    ,"rows_in_block": 1000
-	    ,"cols_in_block": 1000
-	    ,"nnz": 6
-	    ,"format": "binary"
-	    ,"description": { "author": "SystemML" } 
-	}
-</div>
-
-</div>
-
-
-## Loading a Matrix
-
-A matrix can be loaded using the **`read()`** function in DML and the **`load()`** function in PyDML. As with saving, SystemML supports four
-formats: **`text`** (`i,j,v`), **`mm`** (`Matrix Market`), **`csv`** (`delimiter-separated values`), and **`binary`**. To read a file, a corresponding
-metadata file is required, except for the Matrix Market format.
-
-<div class="codetabs2">
-
-<div data-lang="DML" markdown="1">
-{% highlight r %}
-m = read("m.txt")
-print("min:" + min(m))
-print("max:" + max(m))
-print("sum:" + sum(m))
-mRowSums = rowSums(m)
-for (i in 1:nrow(mRowSums)) {
-    print("row " + i + " sum:" + as.scalar(mRowSums[i,1]))
-}
-mColSums = colSums(m)
-for (i in 1:ncol(mColSums)) {
-    print("col " + i + " sum:" + as.scalar(mColSums[1,i]))
-}
-{% endhighlight %}
-</div>
-
-<div data-lang="PyDML" markdown="1">
-{% highlight python %}
-m = load("m.txt")
-print("min:" + min(m))
-print("max:" + max(m))
-print("sum:" + sum(m))
-mRowSums = rowSums(m)
-for (i in 1:nrow(mRowSums)):
-    print("row " + i + " sum:" + scalar(mRowSums[i,1]))
-mColSums = colSums(m)
-for (i in 1:ncol(mColSums)):
-    print("col " + i + " sum:" + scalar(mColSums[1,i]))
-{% endhighlight %}
-</div>
-
-<div data-lang="Result" markdown="1">
-	min:0.0
-	max:9.0
-	sum:30.0
-	row 1 sum:6.0
-	row 2 sum:0.0
-	row 3 sum:24.0
-	row 4 sum:0.0
-	col 1 sum:8.0
-	col 2 sum:10.0
-	col 3 sum:12.0
-</div>
-
-</div>
-
-
-## Matrix Operations
-
-DML and PyDML offer a rich set of operators and built-in functions to perform various operations on matrices and scalars.
-Operators and built-in functions are described in great detail in the DML Language Reference 
-([Expressions](dml-language-reference.html#expressions), [Built-In Functions](dml-language-reference.html#built-in-functions))
-and the PyDML Language Reference
-(Expressions, Built-In Functions).
-
-In this example, we create a matrix A. Next, we create another matrix B by adding 4 to each element in A. Next, we flip
-B by taking its transpose. We then multiply A and B, represented by matrix C. We create a matrix D with the same number
-of rows and columns as C, and initialize its elements to 5. We then subtract D from C and divide the values of its elements
-by 2 and assign the resulting matrix to D.
-
-This example also shows a user-defined function called `printMatrix()`, which takes a string and matrix as arguments and returns
-nothing.
-
-<div class="codetabs2">
-
-<div data-lang="DML" markdown="1">
-{% highlight r %}
-printMatrix = function(string which, matrix[double] mat) {
-    print(which)
-    for (i in 1:nrow(mat)) {
-        colVals = '| '
-        for (j in 1:ncol(mat)) {
-            n = mat[i,j]
-            colVals = colVals + as.scalar(n) + ' | '
-        }
-        print(colVals)
-    }
-}
-
-A = matrix("1 2 3 4 5 6", rows=3, cols=2)
-z = printMatrix('Matrix A:', A)
-B = A + 4
-B = t(B)
-z = printMatrix('Matrix B:', B)
-C = A %*% B
-z = printMatrix('Matrix C:', C)
-D = matrix(5, rows=nrow(C), cols=ncol(C))
-D = (C - D) / 2
-z = printMatrix('Matrix D:', D)
-
-{% endhighlight %}
-</div>
-
-<div data-lang="PyDML" markdown="1">
-{% highlight python %}
-def printMatrix(which: str, mat: matrix[float]):
-    print(which)
-    for (i in 1:nrow(mat)):
-        colVals = '| '
-        for (j in 1:ncol(mat)):
-            n = mat[i,j]
-            colVals = colVals + scalar(n) + ' | '
-        print(colVals)
-
-A = full("1 2 3 4 5 6", rows=3, cols=2)
-z = printMatrix('Matrix A:', A)
-B = A + 4
-B = transpose(B)
-z = printMatrix('Matrix B:', B)
-C = dot(A, B)
-z = printMatrix('Matrix C:', C)
-D = full(5, rows=nrow(C), cols=ncol(C))
-D = (C - D) / 2
-z = printMatrix('Matrix D:', D)
-
-{% endhighlight %}
-</div>
-
-<div data-lang="Result" markdown="1">
-	Matrix A:
-	| 1.0 | 2.0 | 
-	| 3.0 | 4.0 | 
-	| 5.0 | 6.0 | 
-	Matrix B:
-	| 5.0 | 7.0 | 9.0 | 
-	| 6.0 | 8.0 | 10.0 | 
-	Matrix C:
-	| 17.0 | 23.0 | 29.0 | 
-	| 39.0 | 53.0 | 67.0 | 
-	| 61.0 | 83.0 | 105.0 | 
-	Matrix D:
-	| 6.0 | 9.0 | 12.0 | 
-	| 17.0 | 24.0 | 31.0 | 
-	| 28.0 | 39.0 | 50.0 | 
-</div>
-
-</div>
-
-
-## Matrix Indexing
-
-The elements in a matrix can be accessed by their row and column indices. In the example below, we have 3x3 matrix A.
-First, we access the element at the third row and third column. Next, we obtain a row slice (vector) of the matrix by
-specifying row 2 and leaving the column blank. We obtain a column slice (vector) by leaving the row blank and specifying
-column 3. After that, we obtain a submatrix via range indexing, where we specify rows 2 to 3, separated by a colon, and columns
-1 to 2, separated by a colon.
-
-<div class="codetabs2">
-
-<div data-lang="DML" markdown="1">
-{% highlight r %}
-printMatrix = function(string which, matrix[double] mat) {
-    print(which)
-    for (i in 1:nrow(mat)) {
-        colVals = '| '
-        for (j in 1:ncol(mat)) {
-            n = mat[i,j]
-            colVals = colVals + as.scalar(n) + ' | '
-        }
-        print(colVals)
-    }
-}
-
-A = matrix("1 2 3 4 5 6 7 8 9", rows=3, cols=3)
-z = printMatrix('Matrix A:', A)
-B = A[3,3]
-z = printMatrix('Matrix B:', B)
-C = A[2,]
-z = printMatrix('Matrix C:', C)
-D = A[,3]
-z = printMatrix('Matrix D:', D)
-E = A[2:3,1:2]
-z = printMatrix('Matrix E:', E)
-
-{% endhighlight %}
-</div>
-
-<div data-lang="PyDML" markdown="1">
-{% highlight python %}
-def printMatrix(which: str, mat: matrix[float]):
-    print(which)
-    for (i in 1:nrow(mat)):
-        colVals = '| '
-        for (j in 1:ncol(mat)):
-            n = mat[i,j]
-            colVals = colVals + scalar(n) + ' | '
-        print(colVals)
-
-A = full("1 2 3 4 5 6 7 8 9", rows=3, cols=3)
-z = printMatrix('Matrix A:', A)
-B = A[3,3]
-z = printMatrix('Matrix B:', B)
-C = A[2,]
-z = printMatrix('Matrix C:', C)
-D = A[,3]
-z = printMatrix('Matrix D:', D)
-E = A[2:3,1:2]
-z = printMatrix('Matrix E:', E)
-
-{% endhighlight %}
-</div>
-
-<div data-lang="Result" markdown="1">
-	Matrix A:
-	| 1.0 | 2.0 | 3.0 | 
-	| 4.0 | 5.0 | 6.0 | 
-	| 7.0 | 8.0 | 9.0 | 
-	Matrix B:
-	| 9.0 | 
-	Matrix C:
-	| 4.0 | 5.0 | 6.0 | 
-	Matrix D:
-	| 3.0 | 
-	| 6.0 | 
-	| 9.0 | 
-	Matrix E:
-	| 4.0 | 5.0 | 
-	| 7.0 | 8.0 | 
-</div>
-
-</div>
-
-
-# Control Statements
-
-DML and PyDML both feature `if` and `if-else` conditional statements. In addition, DML features `else-if` which avoids the
-need for nested conditional statements.
-
-DML and PyDML feature 3 loop statements: `while`, `for`, and `parfor` (parallel for). In the example, note that the 
-`print` statements within the `parfor` loop can occur in any order since the iterations occur in parallel rather than
-sequentially as in a regular `for` loop. The `parfor` statement can include several optional parameters, as described
-in the DML Language Reference ([ParFor Statement](dml-language-reference.html#parfor-statement)) and PyDML Language Reference (ParFor Statement).
-
-<div class="codetabs2">
-
-<div data-lang="DML" markdown="1">
-{% highlight r %}
-i = 1
-while (i < 3) {
-    if (i == 1) {
-        print('hello')
-    } else {
-        print('world')
-    }
-    i = i + 1
-}
-
-A = matrix("1 2 3 4 5 6", rows=3, cols=2)
-
-for (i in 1:nrow(A)) {
-    print("for A[" + i + ",1]:" + as.scalar(A[i,1]))
-}
-
-parfor(i in 1:nrow(A)) {
-    print("parfor A[" + i + ",1]:" + as.scalar(A[i,1]))
-}
-{% endhighlight %}
-</div>
-
-<div data-lang="PyDML" markdown="1">
-{% highlight python %}
-i = 1
-while (i < 3):
-    if (i == 1):
-        print('hello')
-    else:
-        print('world')
-    i = i + 1
-
-A = full("1 2 3 4 5 6", rows=3, cols=2)
-
-for (i in 1:nrow(A)):
-    print("for A[" + i + ",1]:" + scalar(A[i,1]))
-
-parfor(i in 1:nrow(A)):
-    print("parfor A[" + i + ",1]:" + scalar(A[i,1]))
-{% endhighlight %}
-</div>
-
-<div data-lang="Result" markdown="1">
-	hello
-	world
-	for A[1,1]:1.0
-	for A[2,1]:3.0
-	for A[3,1]:5.0
-	parfor A[2,1]:3.0
-	parfor A[1,1]:1.0
-	parfor A[3,1]:5.0
-</div>
-
-</div>
-
-
-# User-Defined Functions
-
-Functions encapsulate useful functionality in SystemML. In addition to built-in functions, users can define their own functions.
-Functions take 0 or more parameters and return 0 or more values.
-Currently, if a function returns nothing, it still needs to be assigned to a variable.
-
-<div class="codetabs2">
-
-<div data-lang="DML" markdown="1">
-{% highlight r %}
-doSomething = function(matrix[double] mat) return (matrix[double] ret) {
-    additionalCol = matrix(1, rows=nrow(mat), cols=1) # 1x3 matrix with 1 values
-    ret = cbind(mat, additionalCol) # concatenate column to matrix
-    ret = cbind(ret, seq(0, 2, 1))  # concatenate column (0,1,2) to matrix
-    ret = cbind(ret, rowMaxs(ret))  # concatenate column of max row values to matrix
-    ret = cbind(ret, rowSums(ret))  # concatenate column of row sums to matrix
-}
-
-A = rand(rows=3, cols=2, min=0, max=2) # random 3x2 matrix with values 0 to 2
-B = doSomething(A)
-write(A, "A.csv", format="csv")
-write(B, "B.csv", format="csv")
-{% endhighlight %}
-</div>
-
-<div data-lang="PyDML" markdown="1">
-{% highlight python %}
-def doSomething(mat: matrix[float]) -> (ret: matrix[float]):
-    additionalCol = full(1, rows=nrow(mat), cols=1) # 1x3 matrix with 1 values
-    ret = cbind(mat, additionalCol) # concatenate column to matrix
-    ret = cbind(ret, seq(0, 2, 1))  # concatenate column (0,1,2) to matrix
-    ret = cbind(ret, rowMaxs(ret))  # concatenate column of max row values to matrix
-    ret = cbind(ret, rowSums(ret))  # concatenate column of row sums to matrix
-
-A = rand(rows=3, cols=2, min=0, max=2) # random 3x2 matrix with values 0 to 2
-B = doSomething(A)
-save(A, "A.csv", format="csv")
-save(B, "B.csv", format="csv")
-{% endhighlight %}
-</div>
-
-</div>
-
-In the above example, a 3x2 matrix of random doubles between 0 and 2 is created using the **`rand()`** function.
-Additional parameters can be passed to **`rand()`** to control sparsity and other matrix characteristics.
-
-Matrix A is passed to the `doSomething` function. A column of 1 values is concatenated to the matrix. A column
-consisting of the values `(0, 1, 2)` is concatenated to the matrix. Next, a column consisting of the maximum row values
-is concatenated to the matrix. A column consisting of the row sums is concatenated to the matrix, and this resulting
-matrix is returned to variable B. Matrix A is output to the `A.csv` file and matrix B is saved as the `B.csv` file.
-
-
-<div class="codetabs2">
-
-<div data-lang="A.csv" markdown="1">
-	1.6091961493071,0.7088614208099939
-	0.5984862383600267,1.5732118950764993
-	0.2947607068519842,1.9081406573366781
-</div>
-
-<div data-lang="B.csv" markdown="1">
-	1.6091961493071,0.7088614208099939,1.0,0,1.6091961493071,4.927253719424194
-	0.5984862383600267,1.5732118950764993,1.0,1.0,1.5732118950764993,5.744910028513026
-	0.2947607068519842,1.9081406573366781,1.0,2.0,2.0,7.202901364188662
-</div>
-
-</div>
-
-
-# Command-Line Arguments and Default Values
-
-Command-line arguments can be passed to DML and PyDML scripts either as named arguments or as positional arguments. Named
-arguments are the preferred technique. Named arguments can be passed utilizing the `-nvargs` switch, and positional arguments
-can be passed using the `-args` switch.
-
-Default values can be set using the **`ifdef()`** function.
-
-In the example below, a matrix is read from the file system using named argument `M`. The number of rows to print is specified
-using the `rowsToPrint` argument, which defaults to 2 if no argument is supplied. Likewise, the number of columns is
-specified using `colsToPrint` with a default value of 2.
-
-<div class="codetabs2">
-
-<div data-lang="DML" markdown="1">
-{% highlight r %}
-
-fileM = $M
-
-numRowsToPrint = ifdef($rowsToPrint, 2) # default to 2
-numColsToPrint = ifdef($colsToPrint, 2) # default to 2
-
-m = read(fileM)
-
-for (i in 1:numRowsToPrint) {
-    for (j in 1:numColsToPrint) {
-        print('[' + i + ',' + j + ']:' + as.scalar(m[i,j]))
-    }
-}
-
-{% endhighlight %}
-</div>
-
-<div data-lang="PyDML" markdown="1">
-{% highlight python %}
-
-fileM = $M
-
-numRowsToPrint = ifdef($rowsToPrint, 2) # default to 2
-numColsToPrint = ifdef($colsToPrint, 2) # default to 2
-
-m = load(fileM)
-
-for (i in 1:numRowsToPrint):
-    for (j in 1:numColsToPrint):
-        print('[' + i + ',' + j + ']:' + scalar(m[i,j]))
-
-{% endhighlight %}
-</div>
-
-<div data-lang="DML Named Arguments and Results" markdown="1">
-	Example #1 Arguments:
-	-f ex.dml -nvargs M=M.txt rowsToPrint=1 colsToPrint=3
-	
-	Example #1 Results:
-	[1,1]:1.0
-	[1,2]:2.0
-	[1,3]:3.0
-	
-	Example #2 Arguments:
-	-f ex.dml -nvargs M=M.txt
-	
-	Example #2 Results:
-	[1,1]:1.0
-	[1,2]:2.0
-	[2,1]:0.0
-	[2,2]:0.0
-	
-</div>
-
-<div data-lang="PyDML Named Arguments and Results" markdown="1">
-	Example #1 Arguments:
-	-f ex.pydml -python -nvargs M=M.txt rowsToPrint=1 colsToPrint=3
-	
-	Example #1 Results:
-	[1,1]:1.0
-	[1,2]:2.0
-	[1,3]:3.0
-	
-	Example #2 Arguments:
-	-f ex.pydml -python -nvargs M=M.txt
-	
-	Example #2 Results:
-	[1,1]:1.0
-	[1,2]:2.0
-	[2,1]:0.0
-	[2,2]:0.0
-	
-</div>
-
-</div>
-
-Here, we see identical functionality but with positional arguments.
-
-<div class="codetabs2">
-
-<div data-lang="DML" markdown="1">
-{% highlight r %}
-
-fileM = $1
-
-numRowsToPrint = ifdef($2, 2) # default to 2
-numColsToPrint = ifdef($3, 2) # default to 2
-
-m = read(fileM)
-
-for (i in 1:numRowsToPrint) {
-    for (j in 1:numColsToPrint) {
-        print('[' + i + ',' + j + ']:' + as.scalar(m[i,j]))
-    }
-}
-
-{% endhighlight %}
-</div>
-
-<div data-lang="PyDML" markdown="1">
-{% highlight python %}
-
-fileM = $1
-
-numRowsToPrint = ifdef($2, 2) # default to 2
-numColsToPrint = ifdef($3, 2) # default to 2
-
-m = load(fileM)
-
-for (i in 1:numRowsToPrint):
-    for (j in 1:numColsToPrint):
-        print('[' + i + ',' + j + ']:' + scalar(m[i,j]))
-
-{% endhighlight %}
-</div>
-
-<div data-lang="DML Positional Arguments and Results" markdown="1">
-	Example #1 Arguments:
-	-f ex.dml -args M.txt 1 3
-	
-	Example #1 Results:
-	[1,1]:1.0
-	[1,2]:2.0
-	[1,3]:3.0
-	
-	Example #2 Arguments:
-	-f ex.dml -args M.txt
-	
-	Example #2 Results:
-	[1,1]:1.0
-	[1,2]:2.0
-	[2,1]:0.0
-	[2,2]:0.0
-	
-</div>
-
-<div data-lang="PyDML Positional Arguments and Results" markdown="1">
-	Example #1 Arguments:
-	-f ex.pydml -python -args M.txt 1 3
-	
-	Example #1 Results:
-	[1,1]:1.0
-	[1,2]:2.0
-	[1,3]:3.0
-	
-	Example #2 Arguments:
-	-f ex.pydml -python -args M.txt
-	
-	Example #2 Results:
-	[1,1]:1.0
-	[1,2]:2.0
-	[2,1]:0.0
-	[2,2]:0.0
-	
-</div>
-
-</div>
-
-
-# Additional Information
-
-The [DML Language Reference](dml-language-reference.html) and PyDML Language Reference contain highly detailed information regard DML 
-and PyDML.
-
-In addition, many excellent examples of DML and PyDML can be found in the `system-ml/scripts` and 
-`system-ml/test/scripts/applications` directories.
-

http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/65844aa6/docs/hadoop-batch-mode.md
----------------------------------------------------------------------
diff --git a/docs/hadoop-batch-mode.md b/docs/hadoop-batch-mode.md
index 18c25ce..efce4b1 100644
--- a/docs/hadoop-batch-mode.md
+++ b/docs/hadoop-batch-mode.md
@@ -3,6 +3,24 @@ layout: global
 title: Invoking SystemML in Hadoop Batch Mode
 description: Invoking SystemML in Hadoop Batch Mode
 ---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
 
 * This will become a table of contents (this text will be scraped).
 {:toc}

http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/65844aa6/docs/img/beginners-guide-to-dml-and-pydml/dmlscript-debug-configuration-hello-world-main-class.png
----------------------------------------------------------------------
diff --git a/docs/img/beginners-guide-to-dml-and-pydml/dmlscript-debug-configuration-hello-world-main-class.png b/docs/img/beginners-guide-to-dml-and-pydml/dmlscript-debug-configuration-hello-world-main-class.png
new file mode 100644
index 0000000..2ee98b9
Binary files /dev/null and b/docs/img/beginners-guide-to-dml-and-pydml/dmlscript-debug-configuration-hello-world-main-class.png differ

http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/65844aa6/docs/img/beginners-guide-to-dml-and-pydml/dmlscript-debug-configuration-hello-world-program-arguments.png
----------------------------------------------------------------------
diff --git a/docs/img/beginners-guide-to-dml-and-pydml/dmlscript-debug-configuration-hello-world-program-arguments.png b/docs/img/beginners-guide-to-dml-and-pydml/dmlscript-debug-configuration-hello-world-program-arguments.png
new file mode 100644
index 0000000..b1e344a
Binary files /dev/null and b/docs/img/beginners-guide-to-dml-and-pydml/dmlscript-debug-configuration-hello-world-program-arguments.png differ

http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/65844aa6/docs/img/dml-and-pydml-programming-guide/dmlscript-debug-configuration-hello-world-main-class.png
----------------------------------------------------------------------
diff --git a/docs/img/dml-and-pydml-programming-guide/dmlscript-debug-configuration-hello-world-main-class.png b/docs/img/dml-and-pydml-programming-guide/dmlscript-debug-configuration-hello-world-main-class.png
deleted file mode 100644
index 2ee98b9..0000000
Binary files a/docs/img/dml-and-pydml-programming-guide/dmlscript-debug-configuration-hello-world-main-class.png and /dev/null differ

http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/65844aa6/docs/img/dml-and-pydml-programming-guide/dmlscript-debug-configuration-hello-world-program-arguments.png
----------------------------------------------------------------------
diff --git a/docs/img/dml-and-pydml-programming-guide/dmlscript-debug-configuration-hello-world-program-arguments.png b/docs/img/dml-and-pydml-programming-guide/dmlscript-debug-configuration-hello-world-program-arguments.png
deleted file mode 100644
index b1e344a..0000000
Binary files a/docs/img/dml-and-pydml-programming-guide/dmlscript-debug-configuration-hello-world-program-arguments.png and /dev/null differ

http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/65844aa6/docs/img/mlcontext-programming-guide/zeppelin-notebook-systemml-linear-regression.png
----------------------------------------------------------------------
diff --git a/docs/img/mlcontext-programming-guide/zeppelin-notebook-systemml-linear-regression.png b/docs/img/mlcontext-programming-guide/zeppelin-notebook-systemml-linear-regression.png
deleted file mode 100644
index bfadbc4..0000000
Binary files a/docs/img/mlcontext-programming-guide/zeppelin-notebook-systemml-linear-regression.png and /dev/null differ

http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/65844aa6/docs/img/mlcontext-programming-guide/zeppelin-notebook.png
----------------------------------------------------------------------
diff --git a/docs/img/mlcontext-programming-guide/zeppelin-notebook.png b/docs/img/mlcontext-programming-guide/zeppelin-notebook.png
deleted file mode 100644
index 0e10cd1..0000000
Binary files a/docs/img/mlcontext-programming-guide/zeppelin-notebook.png and /dev/null differ

http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/65844aa6/docs/img/spark-mlcontext-programming-guide/zeppelin-notebook-systemml-linear-regression.png
----------------------------------------------------------------------
diff --git a/docs/img/spark-mlcontext-programming-guide/zeppelin-notebook-systemml-linear-regression.png b/docs/img/spark-mlcontext-programming-guide/zeppelin-notebook-systemml-linear-regression.png
new file mode 100644
index 0000000..bfadbc4
Binary files /dev/null and b/docs/img/spark-mlcontext-programming-guide/zeppelin-notebook-systemml-linear-regression.png differ

http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/65844aa6/docs/img/spark-mlcontext-programming-guide/zeppelin-notebook.png
----------------------------------------------------------------------
diff --git a/docs/img/spark-mlcontext-programming-guide/zeppelin-notebook.png b/docs/img/spark-mlcontext-programming-guide/zeppelin-notebook.png
new file mode 100644
index 0000000..0e10cd1
Binary files /dev/null and b/docs/img/spark-mlcontext-programming-guide/zeppelin-notebook.png differ

http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/65844aa6/docs/index.md
----------------------------------------------------------------------
diff --git a/docs/index.md b/docs/index.md
index 0f88f04..3fa90cc 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -1,8 +1,8 @@
 ---
 layout: global
-displayTitle: SystemML Overview
-title: SystemML Overview
-description: SystemML documentation homepage
+displayTitle: SystemML Documentation
+title: SystemML Documentation
+description: SystemML Documentation
 ---
 <!--
 {% comment %}
@@ -23,25 +23,53 @@ limitations under the License.
 {% endcomment %}
 -->
 
-SystemML is a flexible, scalable machine learning (ML) system.
-SystemML's distinguishing characteristics are: (1) algorithm customizability,
-(2) multiple execution modes, including Standalone, Hadoop Batch, and Spark Batch,
-and (3) automatic optimization.
-
 SystemML is now an **Apache Incubator** project! Please see the [**Apache SystemML (incubating)**](http://systemml.apache.org/)
 website for more information.
 
-## SystemML Documentation
+SystemML is a flexible, scalable machine learning system.
+SystemML's distinguishing characteristics are:
+  (1) **Algorithm customizability via R-like and Python-like languages**,
+  (2) **Multiple execution modes**, including Standalone, Spark Batch, Spark MLContext, Hadoop Batch, and JMLC, and
+  (3) **Automatic optimization** based on data and cluster characteristics to ensure both efficiency and scalability.
+
+The [SystemML GitHub README](https://github.com/apache/incubator-systemml) describes
+building, testing, and running SystemML.
+
+## Running SystemML
+
+* **Standalone** - Standalone mode allows data scientists to rapidly prototype algorithms on a single
+machine in R-like and Python-like declarative languages.
+  * The [SystemML GitHub README](https://github.com/apache/incubator-systemml) describes
+  a linear regression example in Standalone Mode.
+  * The [Quick Start Guide](quick-start-guide.html) provides additional examples of algorithm execution
+  in Standalone Mode.
+* **Spark Batch** - Algorithms are automatically optimized to run across Spark clusters.
+  * See **Invoking SystemML in Spark Batch Mode** **(Coming soon)**.
+* **Spark MLContext** - Spark MLContext is a programmatic API for running SystemML from Spark via Scala or Java.
+  * See the [Spark MLContext Programming Guide](spark-mlcontext-programming-guide.html) for
+  [**Spark Shell (Scala)**](spark-mlcontext-programming-guide.html#spark-shell-example),
+  [Java](spark-mlcontext-programming-guide.html#java-example), and
+  [**Zeppelin Notebook**](spark-mlcontext-programming-guide.html#zeppelin-notebook-example---linear-regression-algorithm)
+  examples.
+* **Hadoop Batch** - Algorithms are automatically optimized when distributed across Hadoop clusters.
+  * See [Invoking SystemML in Hadoop Batch Mode](hadoop-batch-mode.html) for detailed information.
+* **JMLC** - Java Machine Learning Connector.
+
+## Language Guides
+
+* [DML Language Reference](dml-language-reference.html) -
+DML is a high-level R-like declarative language for machine learning.
+* **PyDML Language Reference** **(Coming Soon)** -
+PyDML is a high-level Python-like declarative language for machine learning.
+* [Beginner's Guide to DML and PyDML](beginners-guide-to-dml-and-pydml.html) -
+An introduction to the basics of DML and PyDML.
+
+## ML Algorithms
+
+* [Algorithms Reference](algorithms-reference.html) - The Algorithms Reference describes the
+machine learning algorithms included with SystemML in detail.
 
-For more information about SystemML, please consult the following references:
+## Tools
 
-* [SystemML GitHub README](https://github.com/apache/incubator-systemml)
-* [Quick Start Guide](quick-start-guide.html)
-* [DML and PyDML Programming Guide](dml-and-pydml-programming-guide.html)
-* [MLContext Programming Guide](mlcontext-programming-guide.html)
-* [Hadoop Batch Mode](hadoop-batch-mode.html)
-* Spark Batch Mode - **Coming Soon**
-* [Debugger Guide](debugger-guide.html)
-* [Algorithms Reference](algorithms-reference.html)
-* [DML (R-like Declarative Machine Learning) Language Reference](dml-language-reference.html)
-* PyDML (Python-Like Declarative Machine Learning) Language Reference - **Coming Soon**
+* [Debugger Guide](debugger-guide.html) - SystemML supports DML script-level debugging through a
+command-line interface.