You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mahout.apache.org by bu...@apache.org on 2014/08/29 20:41:44 UTC

svn commit: r920735 - in /websites/staging/mahout/trunk/content: ./ users/recommender/intro-cooccurrence-spark.html

Author: buildbot
Date: Fri Aug 29 18:41:43 2014
New Revision: 920735

Log:
Staging update by buildbot for mahout

Modified:
    websites/staging/mahout/trunk/content/   (props changed)
    websites/staging/mahout/trunk/content/users/recommender/intro-cooccurrence-spark.html

Propchange: websites/staging/mahout/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Fri Aug 29 18:41:43 2014
@@ -1 +1 @@
-1621351
+1621356

Modified: websites/staging/mahout/trunk/content/users/recommender/intro-cooccurrence-spark.html
==============================================================================
--- websites/staging/mahout/trunk/content/users/recommender/intro-cooccurrence-spark.html (original)
+++ websites/staging/mahout/trunk/content/users/recommender/intro-cooccurrence-spark.html Fri Aug 29 18:41:43 2014
@@ -246,12 +246,21 @@
   <div id="content-wrap" class="clearfix">
    <div id="main">
     <h1 id="intro-to-cooccurrence-recommenders-with-spark">Intro to Cooccurrence Recommenders with Spark</h1>
-<p>Mahout provides several important building blocks for creating recommendations using Spark. <em>spark-itemsimilarity</em> can be used to create "other people also liked these things" type recommendations and paired with a search engine can personalize recommendations for individual users. <em>spark-rowsimilarity</em> can provide non-personalized content based recommendations, using textual content for example.</p>
+<p>Mahout provides several important building blocks for creating recommendations using Spark. <em>spark-itemsimilarity</em> can 
+be used to create "other people also liked these things" type recommendations and paired with a search engine can 
+personalize recommendations for individual users. <em>spark-rowsimilarity</em> can provide non-personalized content based 
+recommendations, using textual content for example.</p>
 <p>Below are the command line jobs but the drivers and associated code can also be customized and accessed from the Scala APIs.</p>
 <h2 id="1-spark-itemsimilarity">1. spark-itemsimilarity</h2>
 <p><em>spark-itemsimilarity</em> is the Spark counterpart of the of the Mahout mapreduce job called <em>itemsimilarity</em>. It takes in elements of interactions, which have userID, itemID, and optionally a value. It will produce one of more indicator matrices created by comparing every user's interactions with every other user. The indicator matrix is an item x item matrix where the values are log-likelihood ratio strengths. For the legacy mapreduce version, there were several possible similarity measures but these are being deprecated in favor of LLR because in practice it performs the best.</p>
-<p>Mahout's mapreduce version of itemsimilarity takes a text file that is expected to have user and item IDs that conform to Mahout's ID requirements--they are non-negative integer that can be viewed as row and column numbers in a matrix.</p>
-<p><em>spark-itemsimilarity</em> also extends the notion of cooccurrence to cross-cooccurrence, in other words the Spark version will account for multi-modal interactions and create cross-indicator matrices allowing users to make use of much more data in creating recommendations or similar item lists.</p>
+<p>Mahout's mapreduce version of itemsimilarity takes a text file that is expected to have user and item IDs that conform to 
+Mahout's ID requirements--they are non-negative integers that can be viewed as row and column numbers in a matrix.</p>
+<p><em>spark-itemsimilarity</em> also extends the notion of cooccurrence to cross-cooccurrence, in other words the Spark version will 
+account for multi-modal interactions and create cross-indicator matrices allowing the use of much more data in 
+creating recommendations or similar item lists. People try to do this by mixing different actions and giving them weights. 
+For instance they might say an item-view is 0.2 of an item purchase. In practice this is often not helpful. Spark-itemsimilarity's
+cross-cooccurrence is a more principled way to handle this case. In effect it scrubs secondary actions with the action you want
+to recommend.   </p>
 <div class="codehilite"><pre><span class="n">spark</span><span class="o">-</span><span class="n">itemsimilarity</span> <span class="n">Mahout</span> 1<span class="p">.</span>0<span class="o">-</span><span class="n">SNAPSHOT</span>
 <span class="n">Usage</span><span class="p">:</span> <span class="n">spark</span><span class="o">-</span><span class="n">itemsimilarity</span> <span class="p">[</span><span class="n">options</span><span class="p">]</span>
 
@@ -455,7 +464,7 @@
     <span class="o">--</span><span class="n">inDelim</span> &quot;<span class="o">\</span><span class="n">t</span>&quot; <span class="o">\</span>
     <span class="o">--</span><span class="n">itemIDPosition</span> 4 <span class="o">\</span>
     <span class="o">--</span><span class="n">rowIDPosition</span> 1 <span class="o">\</span>
-    <span class="o">--</span><span class="n">filterPosition</span> 2 <span class="o">\</span>
+    <span class="o">--</span><span class="n">filterPosition</span> 2
 </pre></div>