You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mahout.apache.org by pa...@apache.org on 2015/03/09 01:19:51 UTC
svn commit: r1665101 - /mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext

Author: pat
Date: Mon Mar  9 00:19:50 2015
New Revision: 1665101

URL: http://svn.apache.org/r1665101
Log:
fixed some wording

Modified:
    mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext

Modified: mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext
URL: http://svn.apache.org/viewvc/mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext?rev=1665101&r1=1665100&r2=1665101&view=diff
==============================================================================
--- mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext (original)
+++ mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext Mon Mar  9 00:19:50 2015
@@ -1,14 +1,8 @@
 #Intro to Cooccurrence Recommenders with Spark
 
-Mahout's next generation recommender is based on the proven cooccurrence algorithm but takes it several important steps further
-by creating a multimodal recommender, which can make use of many user actions to make recommendations. In the old days 
-only page reads, or purchases could be used alone. Now search terms, locations, all manner of clickstream data can be used to 
-recommend - hence the term multimodal. It also allows the recommendations to be tuned for the placement context by changine 
-the query without recalculating the model - adding to its multimodality.
-
 Mahout provides several important building blocks for creating recommendations using Spark. *spark-itemsimilarity* can 
 be used to create "other people also liked these things" type recommendations and paired with a search engine can 
-personalize multimodal recommendations for individual users. *spark-rowsimilarity* can provide non-personalized content based 
+personalize recommendations for individual users. *spark-rowsimilarity* can provide non-personalized content based 
 recommendations and when paired with a search engine can be used to personalize content based recommendations.
 
 ![image](http://s6.postimg.org/r0m8bpjw1/recommender_architecture.png)
@@ -22,11 +16,10 @@ User history is used as a query on the i
 ##References
 
 1. A free ebook, which talks about the general idea: [Practical Machine Learning](https://www.mapr.com/practical-machine-learning)
-2. A slide deck, which talks about mixing user actions and other indicators: [Multimodal Streaming Recommender](http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/)
+2. A slide deck, which talks about mixing actions or other indicators: [Creating a Unified Recommender](http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/)
 3. Two blog posts: [What's New in Recommenders: part #1](http://occamsmachete.com/ml/2014/08/11/mahout-on-spark-whats-new-in-recommenders/)
 and  [What's New in Recommenders: part #2](http://occamsmachete.com/ml/2014/09/09/mahout-on-spark-whats-new-in-recommenders-part-2/)
-4. A post describing the loglikelihood ratio:  [Surprise and Coinsidense](http://tdunning.blogspot.com/2008/03/surprise-and-coincidence.html)  LLR is used to reduce noise in the data while keeping the calculations O(n) complexity.
-5. A demo [Video Guide][1] site, which uses many of the techniques described above.
+3. A post describing the loglikelihood ratio:  [Surprise and Coinsidense](http://tdunning.blogspot.com/2008/03/surprise-and-coincidence.html)  LLR is used to reduce noise in the data while keeping the calculations O(n) complexity.
 
 Below are the command line jobs but the drivers and associated code can also be customized and accessed from the Scala APIs.
 
@@ -320,11 +313,11 @@ the only similarity method supported thi
 LLR is used more as a quality filter than as a similarity measure. However *spark-rowsimilarity* will produce 
 lists of similar docs for every doc if input is docs with lists of terms. The Apache [Lucene](http://lucene.apache.org) project provides several methods of [analyzing and tokenizing](http://lucene.apache.org/core/4_9_0/core/org/apache/lucene/analysis/package-summary.html#package_description) documents.
 
-#<a name="unified-recommender">4. Creating a Unified Recommender</a>
+#<a name="unified-recommender">4. Creating a Multimodal Recommender</a>
 
-Using the output of *spark-itemsimilarity* and *spark-rowsimilarity* you can build a unified cooccurrence and content based
+Using the output of *spark-itemsimilarity* and *spark-rowsimilarity* you can build a miltimodal cooccurrence and content based
  recommender that can be used in both or either mode depending on indicators available and the history available at 
-runtime for a user.
+runtime for a user. Some slide describing this method can be found [here](http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/)
 
 ##Requirements
 
@@ -381,6 +374,8 @@ items with the most similar tags. Notice
 content or metadata indicator. They are used when you want to find items that are similar to other items by using their 
 content or metadata, not by which users interacted with them.
 
+**Note**: It may be advisable to treat tags as cross-cooccurrence indicators but for the sake of an example they are treated here as content only.
+
 For this we need input of the form:
 
     itemID<tab>list-of-tags
@@ -408,10 +403,9 @@ This is a content indicator since it has
     
 We now have three indicators, two collaborative filtering type and one content type.
 
-##Unified Recommender Query
+##Multimodal Recommender Query
 
-The actual form of the query for recommendations will vary depending on your search engine but the intent is the same. 
-For a given user, map their history of an action or content to the correct indicator field and perform an OR'd query. 
+The actual form of the query for recommendations will vary depending on your search engine but the intent is the same. For a given user, map their history of an action or content to the correct indicator field and perform an OR'd query. 
 
 We have 3 indicators, these are indexed by the search engine into 3 fields, we'll call them "purchase", "view", and "tags". 
 We take the user's history that corresponds to each indicator and create a query of the form:
@@ -443,6 +437,3 @@ This will return recommendations favorin
 2. Content can be used where there is no recorded user behavior or when items change too quickly to get much interaction history. They can be used alone or mixed with other indicators.
 3. Most search engines support "boost" factors so you can favor one or more indicators. In the example query, if you want tags to only have a small effect you could boost the CF indicators.
 4. In the examples we have used space delimited strings for lists of IDs in indicators and in queries. It may be better to use arrays of strings if your storage system and search engine support them. For instance Solr allows multi-valued fields, which correspond to arrays.
-
-
-  [1]: https://guide.finderbots.com
\ No newline at end of file