You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mahout.apache.org by ss...@apache.org on 2014/05/18 17:44:33 UTC

svn commit: r1595629 - /mahout/site/mahout_cms/trunk/content/users/sparkbindings/play-with-shell.mdtext

Author: ssc
Date: Sun May 18 15:44:32 2014
New Revision: 1595629

URL: http://svn.apache.org/r1595629
Log:
MAHOUT-1542 fixed some typos

Modified:
    mahout/site/mahout_cms/trunk/content/users/sparkbindings/play-with-shell.mdtext

Modified: mahout/site/mahout_cms/trunk/content/users/sparkbindings/play-with-shell.mdtext
URL: http://svn.apache.org/viewvc/mahout/site/mahout_cms/trunk/content/users/sparkbindings/play-with-shell.mdtext?rev=1595629&r1=1595628&r2=1595629&view=diff
==============================================================================
--- mahout/site/mahout_cms/trunk/content/users/sparkbindings/play-with-shell.mdtext (original)
+++ mahout/site/mahout_cms/trunk/content/users/sparkbindings/play-with-shell.mdtext Sun May 18 15:44:32 2014
@@ -49,7 +49,7 @@ We'll use the shell to interactively pla
 
 *Note: You can incrementally follow the example by copy-and-pasting the code into your running Mahout shell.*
 
-Mahout's linear algebra DSL has an abstraction called *DistributedRowMatrix (DRM)* which models a matrix that is partitioned by rows and stored in the memory of a cluster of machines. We use ```dense()``` to create a dense in-core matrix from our toy dataset and use ```drmParallelize``` to load it into the cluster, "mimicking" a large, partitioned dataset.
+Mahout's linear algebra DSL has an abstraction called *DistributedRowMatrix (DRM)* which models a matrix that is partitioned by rows and stored in the memory of a cluster of machines. We use ```dense()``` to create a dense in-memory matrix from our toy dataset and use ```drmParallelize``` to load it into the cluster, "mimicking" a large, partitioned dataset.
 
 <div class="codehilite"><pre>
 val drmData = drmParallelize(dense(
@@ -73,13 +73,13 @@ As a first step, we extract *X* and *y* 
 val drmX = drmData(::, 0 until 4)
 </pre></div>
 
-Next, we extract the target variable vector *y*, the fifth column of the data matrix. We assume this one fits into our driver machine, so we fetch it in-core using ```collect```:
+Next, we extract the target variable vector *y*, the fifth column of the data matrix. We assume this one fits into our driver machine, so we fetch it into memory using ```collect```:
 
 <div class="codehilite"><pre>
 val y = drmData.collect(::, 4)
 </pre></div>
 
-Now we are ready to think about a mathematical way to estimate the parameter vector *β*. A simple textbook approach is [ordinary least squares (OLS)](https://en.wikipedia.org/wiki/Ordinary_least_squares), which minimizes the sum of residual squares. In OLS, there is even a closed form expression for estimating *ß* as ***(X<sup>T</sup>X)<sup>-1</sup> X<sup>T</sup>y***.
+Now we are ready to think about a mathematical way to estimate the parameter vector *β*. A simple textbook approach is [ordinary least squares (OLS)](https://en.wikipedia.org/wiki/Ordinary_least_squares), which minimizes the sum of residual squares between the true target variable and the prediction of the target variable. In OLS, there is even a closed form expression for estimating *ß* as ***(X<sup>T</sup>X)<sup>-1</sup> X<sup>T</sup>y***.
 
 The first thing which we compute for this is ***X<sup>T</sup>X***. The code for doing this in Mahout's scala DSL maps directly to the mathematical formula. The operation ```.t()``` transposes a matrix and analogous to R ```%*%``` denotes matrix multiplication.
 
@@ -92,7 +92,7 @@ The same is true for computing ***X<sup>
 val drmXty = drmX.t %*% y
 </pre></div>
 
-We're nearly done. The next step we take is to fetch *X<sup>T</sup>X* and *X<sup>T</sup>y* into the memory of our driver machine (we are targeting features matrices that are tall and skinny , so we can assume that *X<sup>T</sup>X* is small enough to fit in). Then, we provide them to an in-core solver (Mahout provides the an analogon to R's ```solve()``` for that) which computes ```beta```, our OLS estimate of the parameter vector *β*.
+We're nearly done. The next step we take is to fetch *X<sup>T</sup>X* and *X<sup>T</sup>y* into the memory of our driver machine (we are targeting features matrices that are tall and skinny , so we can assume that *X<sup>T</sup>X* is small enough to fit in). Then, we provide them to an in-memory solver (Mahout provides the an analogon to R's ```solve()``` for that) which computes ```beta```, our OLS estimate of the parameter vector *β*.
 
 <div class="codehilite"><pre>
 val XtX = drmXtX.collect