You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mahout.apache.org by bu...@apache.org on 2014/09/21 17:19:29 UTC

svn commit: r923072 - in /websites/staging/mahout/trunk/content: ./ users/recommender/intro-cooccurrence-spark.html

Author: buildbot
Date: Sun Sep 21 15:19:28 2014
New Revision: 923072

Log:
Staging update by buildbot for mahout

Modified:
    websites/staging/mahout/trunk/content/   (props changed)
    websites/staging/mahout/trunk/content/users/recommender/intro-cooccurrence-spark.html

Propchange: websites/staging/mahout/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Sun Sep 21 15:19:28 2014
@@ -1 +1 @@
-1622719
+1626592

Modified: websites/staging/mahout/trunk/content/users/recommender/intro-cooccurrence-spark.html
==============================================================================
--- websites/staging/mahout/trunk/content/users/recommender/intro-cooccurrence-spark.html (original)
+++ websites/staging/mahout/trunk/content/users/recommender/intro-cooccurrence-spark.html Sun Sep 21 15:19:28 2014
@@ -348,8 +348,17 @@ to recommend.   </p>
 </pre></div>
 
 
-<h3 id="more-complex-input">More Complex Input</h3>
-<p>For input of the form:</p>
+<h3 id="how-to-use-multiple-user-actions">How to use Multiple User Actions</h3>
+<p>Often we record various actions the user takes for later analytics. These can now be used to make recommendations. 
+The idea of a recommender is to recommend the action you want the user to make. For an ecom app this might be 
+a purchase action. It is usually not a good idea to just treat other actions the same as the action you want to recommend. 
+For instance a view of an item does not indicate the same intent as a purchase and if you just mixed the two together you 
+might even make worse recommendations. It is tempting though since there are so many more views than purchases. With <em>spark-itemsimilarity</em>
+we can now use both actions. Mahout will use cross-action cooccurrence analysis to limit the views to ones that do predict purchases.
+We do this by treating the primary action (purchase) as data for the indicator matrix and use the secondary action (view) 
+to calculate the cross-indicator matrix.  </p>
+<p><em>spark-itemsimilarity</em> can read separate actions from separate files or from a mixed action log by filtering certain lines. For a mixed 
+action log of the form:</p>
 <div class="codehilite"><pre><span class="n">u1</span><span class="p">,</span><span class="n">purchase</span><span class="p">,</span><span class="n">iphone</span>
 <span class="n">u1</span><span class="p">,</span><span class="n">purchase</span><span class="p">,</span><span class="n">ipad</span>
 <span class="n">u2</span><span class="p">,</span><span class="n">purchase</span><span class="p">,</span><span class="n">nexus</span>
@@ -374,7 +383,7 @@ to recommend.   </p>
 
 
 <h3 id="command-line">Command Line</h3>
-<p>Use the following options can be used:</p>
+<p>Use the following options:</p>
 <div class="codehilite"><pre><span class="n">bash</span>$ <span class="n">mahout</span> <span class="n">spark</span><span class="o">-</span><span class="n">itemsimilarity</span> <span class="o">\</span>
     <span class="o">--</span><span class="n">input</span> <span class="n">in</span><span class="o">-</span><span class="n">file</span> <span class="o">\</span>     # <span class="n">where</span> <span class="n">to</span> <span class="n">look</span> <span class="k">for</span> <span class="n">data</span>
     <span class="o">--</span><span class="n">output</span> <span class="n">out</span><span class="o">-</span><span class="n">path</span> <span class="o">\</span>   # <span class="n">root</span> <span class="n">dir</span> <span class="k">for</span> <span class="n">output</span>
@@ -388,7 +397,8 @@ to recommend.   </p>
 
 
 <h3 id="output">Output</h3>
-<p>The output of the job will be the standard text version of two Mahout DRMs. This is a case where we are calculating cross-cooccurrence so a primary indicator matrix and cross-indicator matrix will be created</p>
+<p>The output of the job will be the standard text version of two Mahout DRMs. This is a case where we are calculating 
+cross-cooccurrence so a primary indicator matrix and cross-indicator matrix will be created</p>
 <div class="codehilite"><pre><span class="n">out</span><span class="o">-</span><span class="n">path</span>
   <span class="o">|--</span> <span class="n">indicator</span><span class="o">-</span><span class="n">matrix</span> <span class="o">-</span> <span class="n">TDF</span> <span class="n">part</span> <span class="n">files</span>
   <span class="o">\--</span> <span class="nb">cross</span><span class="o">-</span><span class="n">indicator</span><span class="o">-</span><span class="n">matrix</span> <span class="o">-</span> <span class="n">TDF</span> <span class="n">part</span><span class="o">-</span><span class="n">files</span>
@@ -413,6 +423,8 @@ to recommend.   </p>
 </pre></div>
 
 
+<p><strong>Note:</strong> You can run this multiple times to use more than two actions or you can use the underlying 
+SimilarityAnalysis.cooccurrence API, which will more efficiently calculate any number of cross-indicators.</p>
 <h3 id="log-file-input">Log File Input</h3>
 <p>A common method of storing data is in log files. If they are written using some delimiter they can be consumed directly by spark-itemsimilarity. For instance input of the form:</p>
 <div class="codehilite"><pre>2014<span class="o">-</span>06<span class="o">-</span>23 14<span class="p">:</span>46<span class="p">:</span>53<span class="p">.</span>115<span class="o">\</span><span class="n">tu1</span><span class="o">\</span><span class="n">tpurchase</span><span class="o">\</span><span class="n">trandom</span> <span class="n">text</span><span class="o">\</span><span class="n">tiphone</span>