You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mahout.apache.org by dl...@apache.org on 2014/05/20 00:20:30 UTC

svn commit: r1596082 - /mahout/site/mahout_cms/trunk/content/users/sparkbindings/play-with-shell.mdtext

Author: dlyubimov
Date: Mon May 19 22:20:30 2014
New Revision: 1596082

URL: http://svn.apache.org/r1596082
Log:
CMS commit to mahout by dlyubimov

Modified:
    mahout/site/mahout_cms/trunk/content/users/sparkbindings/play-with-shell.mdtext

Modified: mahout/site/mahout_cms/trunk/content/users/sparkbindings/play-with-shell.mdtext
URL: http://svn.apache.org/viewvc/mahout/site/mahout_cms/trunk/content/users/sparkbindings/play-with-shell.mdtext?rev=1596082&r1=1596081&r2=1596082&view=diff
==============================================================================
--- mahout/site/mahout_cms/trunk/content/users/sparkbindings/play-with-shell.mdtext (original)
+++ mahout/site/mahout_cms/trunk/content/users/sparkbindings/play-with-shell.mdtext Mon May 19 22:20:30 2014
@@ -65,7 +65,16 @@ val drmData = drmParallelize(dense(
   numPartitions = 2);
 </pre></div>
 
-Have a look at this matrix. The first four columns represent the ingredients (our features) and the last column (the rating) is the target variable for our regression. [Linear regression](https://en.wikipedia.org/wiki/Linear_regression) assumes that the **target variable y** is generated by the linear combination of **the feature matrix X** with the **parameter vector β** plus the **noise ε**, summarized in the formula `\(\mathbf{y}=\mathbf{X}\boldsymbol{\beta}+\boldsymbol{\varepsilon}\)`. Our goal is to find an estimate of the parameter vector `\(\boldsymbol{\beta}\)` that explains the data very well.
+Have a look at this matrix. The first four columns represent the ingredients 
+(our features) and the last column (the rating) is the target variable for 
+our regression. [Linear regression](https://en.wikipedia.org/wiki/Linear_regression) 
+assumes that the **target variable** `\(\mathbf{y}\)` is generated by the 
+linear combination of **the feature matrix** `\(\mathbf{X}\)` with the 
+**parameter vector** `\(\boldsymbol{\beta}\)` plus the
+ **noise** `\(\boldsymbol{\varepsilon}\)`, summarized in the formula 
+`\(\mathbf{y}=\mathbf{X}\boldsymbol{\beta}+\boldsymbol{\varepsilon}\)`. 
+Our goal is to find an estimate of the parameter vector 
+`\(\boldsymbol{\beta}\)` that explains the data very well.
 
 As a first step, we extract `\(\mathbf{X}\)` and `\(\mathbf{y}\)` from our data matrix. We get *X* by slicing: we take all rows (denoted by ```::```) and the first four columns, which have the ingredients in milligrams as content. Note that the result is again a DRM. The shell will not execute this code yet, it saves the history of operations and defers the execution until we really access a result. **Mahout's DSL automatically optimizes and parallelizes all operations on DRMs and runs them on Apache Spark.**