You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mahout.apache.org by pa...@apache.org on 2014/09/21 17:19:21 UTC

svn commit: r1626592 - /mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext

Author: pat
Date: Sun Sep 21 15:19:20 2014
New Revision: 1626592

URL: http://svn.apache.org/r1626592
Log:
better multi-actions indicator calc

Modified:
    mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext

Modified: mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext
URL: http://svn.apache.org/viewvc/mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext?rev=1626592&r1=1626591&r2=1626592&view=diff
==============================================================================
--- mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext (original)
+++ mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext Sun Sep 21 15:19:20 2014
@@ -108,9 +108,19 @@ This will use the "local" Spark context 
 
     itemID1<tab>itemID2:value2<space>itemID10:value10...
 
-###More Complex Input
+###How to use Multiple User Actions
 
-For input of the form:
+Often we record various actions the user takes for later analytics. These can now be used to make recommendations. 
+The idea of a recommender is to recommend the action you want the user to make. For an ecom app this might be 
+a purchase action. It is usually not a good idea to just treat other actions the same as the action you want to recommend. 
+For instance a view of an item does not indicate the same intent as a purchase and if you just mixed the two together you 
+might even make worse recommendations. It is tempting though since there are so many more views than purchases. With *spark-itemsimilarity*
+we can now use both actions. Mahout will use cross-action cooccurrence analysis to limit the views to ones that do predict purchases.
+We do this by treating the primary action (purchase) as data for the indicator matrix and use the secondary action (view) 
+to calculate the cross-indicator matrix.  
+
+*spark-itemsimilarity* can read separate actions from separate files or from a mixed action log by filtering certain lines. For a mixed 
+action log of the form:
 
     u1,purchase,iphone
     u1,purchase,ipad
@@ -136,7 +146,7 @@ For input of the form:
 ###Command Line
 
 
-Use the following options can be used:
+Use the following options:
 
     bash$ mahout spark-itemsimilarity \
     	--input in-file \     # where to look for data
@@ -152,7 +162,8 @@ Use the following options can be used:
 
 ###Output
 
-The output of the job will be the standard text version of two Mahout DRMs. This is a case where we are calculating cross-cooccurrence so a primary indicator matrix and cross-indicator matrix will be created
+The output of the job will be the standard text version of two Mahout DRMs. This is a case where we are calculating 
+cross-cooccurrence so a primary indicator matrix and cross-indicator matrix will be created
 
     out-path
       |-- indicator-matrix - TDF part files
@@ -174,6 +185,9 @@ The cross-indicator matrix will contain:
     galaxy\tnexus:1.7260924347106847 iphone:1.7260924347106847 ipad:1.7260924347106847 galaxy:1.7260924347106847
     surface\tsurface:4.498681156950466 nexus:0.6795961471815897
 
+**Note:** You can run this multiple times to use more than two actions or you can use the underlying 
+SimilarityAnalysis.cooccurrence API, which will more efficiently calculate any number of cross-indicators.
+
 ###Log File Input
 
 A common method of storing data is in log files. If they are written using some delimiter they can be consumed directly by spark-itemsimilarity. For instance input of the form: