You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mahout.apache.org by bu...@apache.org on 2014/05/18 15:46:20 UTC

svn commit: r909180 - in /websites/staging/mahout/trunk/content: ./ users/sparkbindings/play-with-shell.html

Author: buildbot
Date: Sun May 18 13:46:20 2014
New Revision: 909180

Log:
Staging update by buildbot for mahout

Modified:
    websites/staging/mahout/trunk/content/   (props changed)
    websites/staging/mahout/trunk/content/users/sparkbindings/play-with-shell.html

Propchange: websites/staging/mahout/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Sun May 18 13:46:20 2014
@@ -1 +1 @@
-1595610
+1595618

Modified: websites/staging/mahout/trunk/content/users/sparkbindings/play-with-shell.html
==============================================================================
--- websites/staging/mahout/trunk/content/users/sparkbindings/play-with-shell.html (original)
+++ websites/staging/mahout/trunk/content/users/sparkbindings/play-with-shell.html Sun May 18 13:46:20 2014
@@ -368,7 +368,7 @@ val drmData = drmParallelize(dense(
   numPartitions = 2);
 </pre></div>
 
-<p>Have a look at this matrix. The first four columns represent the ingredients (our features) and the last column (the rating) is the target variable for our regression. <a href="https://en.wikipedia.org/wiki/Linear_regression">Linear regression</a> assumes that the <strong>target variable *y<strong><em> is generated by the linear combination of *<em>the feature matrix </em>X</em></strong> with the </strong>parameter vector <em>β<strong><em> plus the *<em>noise </em>ε</em></strong>, summarized in the formula <strong><em>y = Xβ + ε</em></strong>. Our goal is to find an estimate of the parameter vector </em>β* that explains the data very well.</p>
+<p>Have a look at this matrix. The first four columns represent the ingredients (our features) and the last column (the rating) is the target variable for our regression. <a href="https://en.wikipedia.org/wiki/Linear_regression">Linear regression</a> assumes that the <strong>target variable y</strong> is generated by the linear combination of <strong>the feature matrix X</strong> with the <strong>parameter vector β</strong> plus the <strong>noise ε</strong>, summarized in the formula <strong>y = Xβ + ε</strong>. Our goal is to find an estimate of the parameter vector <em>β</em> that explains the data very well.</p>
 <p>As a first step, we extract <em>X</em> and <em>y</em> from our data matrix. We get <em>X</em> by slicing: we take all rows (denoted by <code>::</code>) and the first four columns, which have the ingredients in milligrams as content. Note that the result is again a DRM. The shell will not execute this code yet, it saves the history of operations and defers the execution until we really access a result. <strong>Mahout's DSL automatically optimizes and parallelizes all operations on DRMs and runs them on Apache Spark.</strong></p>
 <div class="codehilite"><pre>
 val drmX = drmData(::, 0 until 4)