You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mahout.apache.org by pa...@apache.org on 2015/04/26 18:49:40 UTC

svn commit: r1676126 - /mahout/site/mahout_cms/trunk/content/users/environment/how-to-build-an-app.mdtext

Author: pat
Date: Sun Apr 26 16:49:40 2015
New Revision: 1676126

URL: http://svn.apache.org/r1676126
Log:
CMS commit to mahout by pat

Modified:
    mahout/site/mahout_cms/trunk/content/users/environment/how-to-build-an-app.mdtext

Modified: mahout/site/mahout_cms/trunk/content/users/environment/how-to-build-an-app.mdtext
URL: http://svn.apache.org/viewvc/mahout/site/mahout_cms/trunk/content/users/environment/how-to-build-an-app.mdtext?rev=1676126&r1=1676125&r2=1676126&view=diff
==============================================================================
--- mahout/site/mahout_cms/trunk/content/users/environment/how-to-build-an-app.mdtext (original)
+++ mahout/site/mahout_cms/trunk/content/users/environment/how-to-build-an-app.mdtext Sun Apr 26 16:49:40 2015
@@ -179,4 +179,79 @@ After setting breakpoints you are now re
 
 For small script-like apps you may wish to use the Mahout shell. It is a Scala REPL type interactive shell built on the Spark shell with Mahout-Samsara extensions.
 
-For the shell you won't need the context, since it is created when the shell is launched. To control the configuration of Mahout and Spark we set environment variables. 
\ No newline at end of file
+To make the CooccurrenceDriver.scala into a script make the following changes:
+
+* You won't need the context, since it is created when the shell is launched, comment that line out.
+* Replace the logger.info lines with println
+* Remove the package info since it's not needed, this will produce the file in ```path/to/3-input-cooc/bin/CooccurrenceDriver.mscala```. 
+
+Note the extension ```.mscala``` to indicate we are using Mahout's scala extensions for math, otherwise known as [Mahout-Samsara](http://mahout.apache.org/users/environment/out-of-core-reference.html)
+
+To run the code make sure the output does not exist already
+
+    $ rm -r /path/to/3-input-cooc/data/indicators
+    
+Launch the Mahout + Spark shell:
+
+    $ mahout spark-shell
+    
+You'll see the Mahout splash:
+
+    MAHOUT_LOCAL is set, so we don't add HADOOP_CONF_DIR to classpath.
+
+                         _                 _
+             _ __ ___   __ _| |__   ___  _   _| |_
+            | '_ ` _ \ / _` | '_ \ / _ \| | | | __|
+            | | | | | | (_| | | | | (_) | |_| | |_
+            |_| |_| |_|\__,_|_| |_|\___/ \__,_|\__|  version 0.10.0
+
+      
+    Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_72)
+    Type in expressions to have them evaluated.
+    Type :help for more information.
+    15/04/26 09:30:48 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
+    Created spark context..
+    Mahout distributed context is available as "implicit val sdc".
+    mahout> 
+
+To load the driver type:
+
+    mahout> :load /path/to/3-input-cooc/bin/CooccurrenceDriver.mscala
+    Loading ./bin/CooccurrenceDriver.mscala...
+    import com.google.common.collect.{HashBiMap, BiMap}
+    import org.apache.log4j.Logger
+    import org.apache.mahout.math.cf.SimilarityAnalysis
+    import org.apache.mahout.math.indexeddataset._
+    import org.apache.mahout.sparkbindings._
+    import scala.collection.immutable.HashMap
+    defined module CooccurrenceDriver
+    mahout> 
+
+To run the driver type:
+
+    mahout> CooccurrenceDriver.main(args = Array(""))
+    
+You'll get some stats printed:
+
+    Read in action purchase, which has 4 rows
+    actions has 1 elements in it.
+
+    Read in action view, which has 4 rows
+    actions has 2 elements in it.
+
+    Read in action category, which has 4 rows
+    actions has 3 elements in it.
+
+    Total number of users for all actions = 4
+
+    purchase indicator matrix:
+    Number of rows for matrix = 4
+    Number of columns for matrix = 5
+    view indicator matrix:
+    Number of rows for matrix = 4
+    Number of columns for matrix = 5
+    category indicator matrix:
+    Number of rows for matrix = 4
+    Number of columns for matrix = 7
+    
+If you look in ```path/to/3-input-cooc/data/indicators``` you should find folders containing the indicator matrices.