You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mahout.apache.org by pa...@apache.org on 2014/08/29 20:41:39 UTC
svn commit: r1621356 -
/mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext
Author: pat
Date: Fri Aug 29 18:41:39 2014
New Revision: 1621356
URL: http://svn.apache.org/r1621356
Log:
CMS commit to mahout by pat
Modified:
mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext
Modified: mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext
URL: http://svn.apache.org/viewvc/mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext?rev=1621356&r1=1621355&r2=1621356&view=diff
==============================================================================
--- mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext (original)
+++ mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext Fri Aug 29 18:41:39 2014
@@ -1,15 +1,24 @@
#Intro to Cooccurrence Recommenders with Spark
-Mahout provides several important building blocks for creating recommendations using Spark. *spark-itemsimilarity* can be used to create "other people also liked these things" type recommendations and paired with a search engine can personalize recommendations for individual users. *spark-rowsimilarity* can provide non-personalized content based recommendations, using textual content for example.
+Mahout provides several important building blocks for creating recommendations using Spark. *spark-itemsimilarity* can
+be used to create "other people also liked these things" type recommendations and paired with a search engine can
+personalize recommendations for individual users. *spark-rowsimilarity* can provide non-personalized content based
+recommendations, using textual content for example.
Below are the command line jobs but the drivers and associated code can also be customized and accessed from the Scala APIs.
##1. spark-itemsimilarity
*spark-itemsimilarity* is the Spark counterpart of the of the Mahout mapreduce job called *itemsimilarity*. It takes in elements of interactions, which have userID, itemID, and optionally a value. It will produce one of more indicator matrices created by comparing every user's interactions with every other user. The indicator matrix is an item x item matrix where the values are log-likelihood ratio strengths. For the legacy mapreduce version, there were several possible similarity measures but these are being deprecated in favor of LLR because in practice it performs the best.
-Mahout's mapreduce version of itemsimilarity takes a text file that is expected to have user and item IDs that conform to Mahout's ID requirements--they are non-negative integer that can be viewed as row and column numbers in a matrix.
+Mahout's mapreduce version of itemsimilarity takes a text file that is expected to have user and item IDs that conform to
+Mahout's ID requirements--they are non-negative integers that can be viewed as row and column numbers in a matrix.
-*spark-itemsimilarity* also extends the notion of cooccurrence to cross-cooccurrence, in other words the Spark version will account for multi-modal interactions and create cross-indicator matrices allowing users to make use of much more data in creating recommendations or similar item lists.
+*spark-itemsimilarity* also extends the notion of cooccurrence to cross-cooccurrence, in other words the Spark version will
+account for multi-modal interactions and create cross-indicator matrices allowing the use of much more data in
+creating recommendations or similar item lists. People try to do this by mixing different actions and giving them weights.
+For instance they might say an item-view is 0.2 of an item purchase. In practice this is often not helpful. Spark-itemsimilarity's
+cross-cooccurrence is a more principled way to handle this case. In effect it scrubs secondary actions with the action you want
+to recommend.
spark-itemsimilarity Mahout 1.0-SNAPSHOT
@@ -97,7 +106,7 @@ This looks daunting but defaults to simp
See ItemSimilarityDriver.scala in Mahout's spark module if you want to customize the code.
-###Defaults in the *spark-itemsimilarity* CLI
+###Defaults in the _spark-itemsimilarity_ CLI
If all defaults are used the input can be as simple as:
@@ -217,7 +226,7 @@ Can be parsed with the following CLI and
--inDelim "\t" \
--itemIDPosition 4 \
--rowIDPosition 1 \
- --filterPosition 2 \
+ --filterPosition 2
##2. spark-rowsimilarity