You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mahout.apache.org by ss...@apache.org on 2014/05/18 15:46:17 UTC

svn commit: r1595618 - /mahout/site/mahout_cms/trunk/content/users/sparkbindings/play-with-shell.mdtext

Author: ssc
Date: Sun May 18 13:46:16 2014
New Revision: 1595618

URL: http://svn.apache.org/r1595618
Log:
CMS commit to mahout by ssc

Modified:
    mahout/site/mahout_cms/trunk/content/users/sparkbindings/play-with-shell.mdtext

Modified: mahout/site/mahout_cms/trunk/content/users/sparkbindings/play-with-shell.mdtext
URL: http://svn.apache.org/viewvc/mahout/site/mahout_cms/trunk/content/users/sparkbindings/play-with-shell.mdtext?rev=1595618&r1=1595617&r2=1595618&view=diff
==============================================================================
--- mahout/site/mahout_cms/trunk/content/users/sparkbindings/play-with-shell.mdtext (original)
+++ mahout/site/mahout_cms/trunk/content/users/sparkbindings/play-with-shell.mdtext Sun May 18 13:46:16 2014
@@ -65,7 +65,7 @@ val drmData = drmParallelize(dense(
   numPartitions = 2);
 </pre></div>
 
-Have a look at this matrix. The first four columns represent the ingredients (our features) and the last column (the rating) is the target variable for our regression. [Linear regression](https://en.wikipedia.org/wiki/Linear_regression) assumes that the **target variable *y*** is generated by the linear combination of **the feature matrix *X*** with the **parameter vector *β*** plus the **noise *ε***, summarized in the formula ***y = Xβ + ε***. Our goal is to find an estimate of the parameter vector *β* that explains the data very well.
+Have a look at this matrix. The first four columns represent the ingredients (our features) and the last column (the rating) is the target variable for our regression. [Linear regression](https://en.wikipedia.org/wiki/Linear_regression) assumes that the **target variable y** is generated by the linear combination of **the feature matrix X** with the **parameter vector β** plus the **noise ε**, summarized in the formula **y = Xβ + ε**. Our goal is to find an estimate of the parameter vector *β* that explains the data very well.
 
 As a first step, we extract *X* and *y* from our data matrix. We get *X* by slicing: we take all rows (denoted by ```::```) and the first four columns, which have the ingredients in milligrams as content. Note that the result is again a DRM. The shell will not execute this code yet, it saves the history of operations and defers the execution until we really access a result. **Mahout's DSL automatically optimizes and parallelizes all operations on DRMs and runs them on Apache Spark.**