You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mahout.apache.org by pa...@apache.org on 2014/08/29 20:41:39 UTC

svn commit: r1621356 - /mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext

Author: pat
Date: Fri Aug 29 18:41:39 2014
New Revision: 1621356

URL: http://svn.apache.org/r1621356
Log:
CMS commit to mahout by pat

Modified:
    mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext

Modified: mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext
URL: http://svn.apache.org/viewvc/mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext?rev=1621356&r1=1621355&r2=1621356&view=diff
==============================================================================
--- mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext (original)
+++ mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext Fri Aug 29 18:41:39 2014
@@ -1,15 +1,24 @@
 #Intro to Cooccurrence Recommenders with Spark
 
-Mahout provides several important building blocks for creating recommendations using Spark. *spark-itemsimilarity* can be used to create "other people also liked these things" type recommendations and paired with a search engine can personalize recommendations for individual users. *spark-rowsimilarity* can provide non-personalized content based recommendations, using textual content for example.
+Mahout provides several important building blocks for creating recommendations using Spark. *spark-itemsimilarity* can 
+be used to create "other people also liked these things" type recommendations and paired with a search engine can 
+personalize recommendations for individual users. *spark-rowsimilarity* can provide non-personalized content based 
+recommendations, using textual content for example.
 
 Below are the command line jobs but the drivers and associated code can also be customized and accessed from the Scala APIs.
 
 ##1. spark-itemsimilarity
 *spark-itemsimilarity* is the Spark counterpart of the of the Mahout mapreduce job called *itemsimilarity*. It takes in elements of interactions, which have userID, itemID, and optionally a value. It will produce one of more indicator matrices created by comparing every user's interactions with every other user. The indicator matrix is an item x item matrix where the values are log-likelihood ratio strengths. For the legacy mapreduce version, there were several possible similarity measures but these are being deprecated in favor of LLR because in practice it performs the best.
 
-Mahout's mapreduce version of itemsimilarity takes a text file that is expected to have user and item IDs that conform to Mahout's ID requirements--they are non-negative integer that can be viewed as row and column numbers in a matrix.
+Mahout's mapreduce version of itemsimilarity takes a text file that is expected to have user and item IDs that conform to 
+Mahout's ID requirements--they are non-negative integers that can be viewed as row and column numbers in a matrix.
 
-*spark-itemsimilarity* also extends the notion of cooccurrence to cross-cooccurrence, in other words the Spark version will account for multi-modal interactions and create cross-indicator matrices allowing users to make use of much more data in creating recommendations or similar item lists.
+*spark-itemsimilarity* also extends the notion of cooccurrence to cross-cooccurrence, in other words the Spark version will 
+account for multi-modal interactions and create cross-indicator matrices allowing the use of much more data in 
+creating recommendations or similar item lists. People try to do this by mixing different actions and giving them weights. 
+For instance they might say an item-view is 0.2 of an item purchase. In practice this is often not helpful. Spark-itemsimilarity's
+cross-cooccurrence is a more principled way to handle this case. In effect it scrubs secondary actions with the action you want
+to recommend.   
 
 
     spark-itemsimilarity Mahout 1.0-SNAPSHOT
@@ -97,7 +106,7 @@ This looks daunting but defaults to simp
 
 See ItemSimilarityDriver.scala in Mahout's spark module if you want to customize the code. 
 
-###Defaults in the *spark-itemsimilarity* CLI
+###Defaults in the _spark-itemsimilarity_ CLI
 
 If all defaults are used the input can be as simple as:
 
@@ -217,7 +226,7 @@ Can be parsed with the following CLI and
         --inDelim "\t" \
         --itemIDPosition 4 \
         --rowIDPosition 1 \
-        --filterPosition 2 \
+        --filterPosition 2
 
 ##2. spark-rowsimilarity