You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mahout.apache.org by bu...@apache.org on 2015/04/26 18:49:45 UTC

svn commit: r949248 - in /websites/staging/mahout/trunk/content: ./ users/environment/how-to-build-an-app.html

Author: buildbot
Date: Sun Apr 26 16:49:44 2015
New Revision: 949248

Log:
Staging update by buildbot for mahout

Modified:
    websites/staging/mahout/trunk/content/   (props changed)
    websites/staging/mahout/trunk/content/users/environment/how-to-build-an-app.html

Propchange: websites/staging/mahout/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Sun Apr 26 16:49:44 2015
@@ -1 +1 @@
-1676117
+1676126

Modified: websites/staging/mahout/trunk/content/users/environment/how-to-build-an-app.html
==============================================================================
--- websites/staging/mahout/trunk/content/users/environment/how-to-build-an-app.html (original)
+++ websites/staging/mahout/trunk/content/users/environment/how-to-build-an-app.html Sun Apr 26 16:49:44 2015
@@ -428,7 +428,87 @@ def writeIndicators<span class="p">(</sp
 <p>After setting breakpoints you are now ready to debug the configuration. Go to the Run-&gt;Debug... menu and pick your configuration. This will execute using a local standalone instance of Spark.</p>
 <h2 id="the-mahout-shell">The Mahout Shell</h2>
 <p>For small script-like apps you may wish to use the Mahout shell. It is a Scala REPL type interactive shell built on the Spark shell with Mahout-Samsara extensions.</p>
-<p>For the shell you won't need the context, since it is created when the shell is launched. To control the configuration of Mahout and Spark we set environment variables. </p>
+<p>To make the CooccurrenceDriver.scala into a script make the following changes:</p>
+<ul>
+<li>You won't need the context, since it is created when the shell is launched, comment that line out.</li>
+<li>Replace the logger.info lines with println</li>
+<li>Remove the package info since it's not needed, this will produce the file in <code>path/to/3-input-cooc/bin/CooccurrenceDriver.mscala</code>. </li>
+</ul>
+<p>Note the extension <code>.mscala</code> to indicate we are using Mahout's scala extensions for math, otherwise known as <a href="http://mahout.apache.org/users/environment/out-of-core-reference.html">Mahout-Samsara</a></p>
+<p>To run the code make sure the output does not exist already</p>
+<div class="codehilite"><pre>$ <span class="n">rm</span> <span class="o">-</span><span class="n">r</span> <span class="o">/</span><span class="n">path</span><span class="o">/</span><span class="n">to</span><span class="o">/</span>3<span class="o">-</span><span class="n">input</span><span class="o">-</span><span class="n">cooc</span><span class="o">/</span><span class="n">data</span><span class="o">/</span><span class="n">indicators</span>
+</pre></div>
+
+
+<p>Launch the Mahout + Spark shell:</p>
+<div class="codehilite"><pre>$ <span class="n">mahout</span> <span class="n">spark</span><span class="o">-</span><span class="n">shell</span>
+</pre></div>
+
+
+<p>You'll see the Mahout splash:</p>
+<div class="codehilite"><pre><span class="n">MAHOUT_LOCAL</span> <span class="n">is</span> <span class="n">set</span><span class="p">,</span> <span class="n">so</span> <span class="n">we</span> <span class="n">don</span><span class="o">&#39;</span><span class="n">t</span> <span class="n">add</span> <span class="n">HADOOP_CONF_DIR</span> <span class="n">to</span> <span class="n">classpath</span><span class="p">.</span>
+
+                     <span class="n">_</span>                 <span class="n">_</span>
+         <span class="n">_</span> <span class="n">__</span> <span class="n">___</span>   <span class="n">__</span> <span class="n">_</span><span class="o">|</span> <span class="o">|</span><span class="n">__</span>   <span class="n">___</span>  <span class="n">_</span>   <span class="n">_</span><span class="o">|</span> <span class="o">|</span><span class="n">_</span>
+        <span class="o">|</span> <span class="s">&#39;_ ` _ \ / _` | &#39;</span><span class="n">_</span> <span class="o">\</span> <span class="o">/</span> <span class="n">_</span> <span class="o">\|</span> <span class="o">|</span> <span class="o">|</span> <span class="o">|</span> <span class="n">__</span><span class="o">|</span>
+        <span class="o">|</span> <span class="o">|</span> <span class="o">|</span> <span class="o">|</span> <span class="o">|</span> <span class="o">|</span> <span class="p">(</span><span class="n">_</span><span class="o">|</span> <span class="o">|</span> <span class="o">|</span> <span class="o">|</span> <span class="o">|</span> <span class="p">(</span><span class="n">_</span><span class="p">)</span> <span class="o">|</span> <span class="o">|</span><span class="n">_</span><span class="o">|</span> <span class="o">|</span> <span class="o">|</span><span class="n">_</span>
+        <span class="o">|</span><span class="n">_</span><span class="o">|</span> <span class="o">|</span><span class="n">_</span><span class="o">|</span> <span class="o">|</span><span class="n">_</span><span class="o">|\</span><span class="n">__</span><span class="p">,</span><span class="n">_</span><span class="o">|</span><span class="n">_</span><span class="o">|</span> <span class="o">|</span><span class="n">_</span><span class="o">|\</span><span class="n">___</span><span class="o">/</span> <span class="o">\</span><span class="n">__</span><span class="p">,</span><span class="n">_</span><span class="o">|\</span><span class="n">__</span><span class="o">|</span>  <span class="n">version</span> 0<span class="p">.</span>10<span class="p">.</span>0
+
+
+<span class="n">Using</span> <span class="n">Scala</span> <span class="n">version</span> 2<span class="p">.</span>10<span class="p">.</span>4 <span class="p">(</span><span class="n">Java</span> <span class="n">HotSpot</span><span class="p">(</span><span class="n">TM</span><span class="p">)</span> 64<span class="o">-</span><span class="n">Bit</span> <span class="n">Server</span> <span class="n">VM</span><span class="p">,</span> <span class="n">Java</span> 1<span class="p">.</span>7<span class="p">.</span>0<span class="n">_72</span><span class="p">)</span>
+<span class="n">Type</span> <span class="n">in</span> <span class="n">expressions</span> <span class="n">to</span> <span class="n">have</span> <span class="n">them</span> <span class="n">evaluated</span><span class="p">.</span>
+<span class="n">Type</span> <span class="p">:</span><span class="n">help</span> <span class="k">for</span> <span class="n">more</span> <span class="n">information</span><span class="p">.</span>
+15<span class="o">/</span>04<span class="o">/</span>26 09<span class="p">:</span>30<span class="p">:</span>48 <span class="n">WARN</span> <span class="n">NativeCodeLoader</span><span class="p">:</span> <span class="n">Unable</span> <span class="n">to</span> <span class="n">load</span> <span class="n">native</span><span class="o">-</span><span class="n">hadoop</span> <span class="n">library</span> <span class="k">for</span> <span class="n">your</span> <span class="n">platform</span><span class="p">...</span> <span class="n">using</span> <span class="n">builtin</span><span class="o">-</span><span class="n">java</span> <span class="n">classes</span> <span class="n">where</span> <span class="n">applicable</span>
+<span class="n">Created</span> <span class="n">spark</span> <span class="n">context</span><span class="p">..</span>
+<span class="n">Mahout</span> <span class="n">distributed</span> <span class="n">context</span> <span class="n">is</span> <span class="n">available</span> <span class="n">as</span> &quot;<span class="n">implicit</span> <span class="n">val</span> <span class="n">sdc</span>&quot;<span class="p">.</span>
+<span class="n">mahout</span><span class="o">&gt;</span>
+</pre></div>
+
+
+<p>To load the driver type:</p>
+<div class="codehilite"><pre><span class="n">mahout</span><span class="o">&gt;</span> <span class="p">:</span><span class="n">load</span> <span class="o">/</span><span class="n">path</span><span class="o">/</span><span class="n">to</span><span class="o">/</span>3<span class="o">-</span><span class="n">input</span><span class="o">-</span><span class="n">cooc</span><span class="o">/</span><span class="n">bin</span><span class="o">/</span><span class="n">CooccurrenceDriver</span><span class="p">.</span><span class="n">mscala</span>
+<span class="n">Loading</span> <span class="o">./</span><span class="n">bin</span><span class="o">/</span><span class="n">CooccurrenceDriver</span><span class="p">.</span><span class="n">mscala</span><span class="p">...</span>
+<span class="n">import</span> <span class="n">com</span><span class="p">.</span><span class="n">google</span><span class="p">.</span><span class="n">common</span><span class="p">.</span><span class="n">collect</span><span class="p">.{</span><span class="n">HashBiMap</span><span class="p">,</span> <span class="n">BiMap</span><span class="p">}</span>
+<span class="n">import</span> <span class="n">org</span><span class="p">.</span><span class="n">apache</span><span class="p">.</span><span class="n">log4j</span><span class="p">.</span><span class="n">Logger</span>
+<span class="n">import</span> <span class="n">org</span><span class="p">.</span><span class="n">apache</span><span class="p">.</span><span class="n">mahout</span><span class="p">.</span><span class="n">math</span><span class="p">.</span><span class="n">cf</span><span class="p">.</span><span class="n">SimilarityAnalysis</span>
+<span class="n">import</span> <span class="n">org</span><span class="p">.</span><span class="n">apache</span><span class="p">.</span><span class="n">mahout</span><span class="p">.</span><span class="n">math</span><span class="p">.</span><span class="n">indexeddataset</span><span class="p">.</span><span class="n">_</span>
+<span class="n">import</span> <span class="n">org</span><span class="p">.</span><span class="n">apache</span><span class="p">.</span><span class="n">mahout</span><span class="p">.</span><span class="n">sparkbindings</span><span class="p">.</span><span class="n">_</span>
+<span class="n">import</span> <span class="n">scala</span><span class="p">.</span><span class="n">collection</span><span class="p">.</span><span class="n">immutable</span><span class="p">.</span><span class="n">HashMap</span>
+<span class="n">defined</span> <span class="n">module</span> <span class="n">CooccurrenceDriver</span>
+<span class="n">mahout</span><span class="o">&gt;</span>
+</pre></div>
+
+
+<p>To run the driver type:</p>
+<div class="codehilite"><pre><span class="n">mahout</span><span class="o">&gt;</span> <span class="n">CooccurrenceDriver</span><span class="p">.</span><span class="n">main</span><span class="p">(</span><span class="n">args</span> <span class="p">=</span> <span class="n">Array</span><span class="p">(</span>&quot;&quot;<span class="p">))</span>
+</pre></div>
+
+
+<p>You'll get some stats printed:</p>
+<div class="codehilite"><pre><span class="n">Read</span> <span class="n">in</span> <span class="n">action</span> <span class="n">purchase</span><span class="p">,</span> <span class="n">which</span> <span class="n">has</span> 4 <span class="n">rows</span>
+<span class="n">actions</span> <span class="n">has</span> 1 <span class="n">elements</span> <span class="n">in</span> <span class="n">it</span><span class="p">.</span>
+
+<span class="n">Read</span> <span class="n">in</span> <span class="n">action</span> <span class="n">view</span><span class="p">,</span> <span class="n">which</span> <span class="n">has</span> 4 <span class="n">rows</span>
+<span class="n">actions</span> <span class="n">has</span> 2 <span class="n">elements</span> <span class="n">in</span> <span class="n">it</span><span class="p">.</span>
+
+<span class="n">Read</span> <span class="n">in</span> <span class="n">action</span> <span class="n">category</span><span class="p">,</span> <span class="n">which</span> <span class="n">has</span> 4 <span class="n">rows</span>
+<span class="n">actions</span> <span class="n">has</span> 3 <span class="n">elements</span> <span class="n">in</span> <span class="n">it</span><span class="p">.</span>
+
+<span class="n">Total</span> <span class="n">number</span> <span class="n">of</span> <span class="n">users</span> <span class="k">for</span> <span class="n">all</span> <span class="n">actions</span> <span class="p">=</span> 4
+
+<span class="n">purchase</span> <span class="n">indicator</span> <span class="n">matrix</span><span class="p">:</span>
+<span class="n">Number</span> <span class="n">of</span> <span class="n">rows</span> <span class="k">for</span> <span class="n">matrix</span> <span class="p">=</span> 4
+<span class="n">Number</span> <span class="n">of</span> <span class="n">columns</span> <span class="k">for</span> <span class="n">matrix</span> <span class="p">=</span> 5
+<span class="n">view</span> <span class="n">indicator</span> <span class="n">matrix</span><span class="p">:</span>
+<span class="n">Number</span> <span class="n">of</span> <span class="n">rows</span> <span class="k">for</span> <span class="n">matrix</span> <span class="p">=</span> 4
+<span class="n">Number</span> <span class="n">of</span> <span class="n">columns</span> <span class="k">for</span> <span class="n">matrix</span> <span class="p">=</span> 5
+<span class="n">category</span> <span class="n">indicator</span> <span class="n">matrix</span><span class="p">:</span>
+<span class="n">Number</span> <span class="n">of</span> <span class="n">rows</span> <span class="k">for</span> <span class="n">matrix</span> <span class="p">=</span> 4
+<span class="n">Number</span> <span class="n">of</span> <span class="n">columns</span> <span class="k">for</span> <span class="n">matrix</span> <span class="p">=</span> 7
+</pre></div>
+
+
+<p>If you look in <code>path/to/3-input-cooc/data/indicators</code> you should find folders containing the indicator matrices.</p>
    </div>
   </div>     
 </div>