You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mahout.apache.org by pa...@apache.org on 2014/10/29 18:25:17 UTC

svn commit: r1635210 - /mahout/site/mahout_cms/trunk/content/users/sparkbindings/faq.mdtext

Author: pat
Date: Wed Oct 29 17:25:17 2014
New Revision: 1635210

URL: http://svn.apache.org/r1635210
Log:
CMS commit to mahout by pat

Modified:
    mahout/site/mahout_cms/trunk/content/users/sparkbindings/faq.mdtext

Modified: mahout/site/mahout_cms/trunk/content/users/sparkbindings/faq.mdtext
URL: http://svn.apache.org/viewvc/mahout/site/mahout_cms/trunk/content/users/sparkbindings/faq.mdtext?rev=1635210&r1=1635209&r2=1635210&view=diff
==============================================================================
--- mahout/site/mahout_cms/trunk/content/users/sparkbindings/faq.mdtext (original)
+++ mahout/site/mahout_cms/trunk/content/users/sparkbindings/faq.mdtext Wed Oct 29 17:25:17 2014
@@ -16,7 +16,7 @@ Notice:    Licensed to the Apache Softwa
            specific language governing permissions and limitations
            under the License.
 
-# Mahout FAQ on Scala algebraic optimizer AKA "Spark/Scala bindings" 
+# FAQ for using Mahout with Spark
 
 ## Q: Mahout Spark shell doesn't start; "ClassNotFound" problems or various classpath problems.
 
@@ -24,22 +24,37 @@ A: So far as of the time of this writing
 around classpath issues one way or another. 
 
 If you are getting method signature like errors, most probably you have mismatch between Mahout's Spark dependency 
-and actual Spark installed. (At the time of this writing the HEAD depends on Spark 1.0.1).
+and actual Spark installed. (At the time of this writing the HEAD depends on Spark 1.1.0) but check mahout/pom.xml.
 
 Troubleshooting general classpath issues is pretty straightforward. Since Mahout is using Spark's installation 
 and its classpath as reported by Spark itself for Spark-related dependencies, it is important to make sure 
 the classpath is sane and is made available to Mahout:
 
-(1) Check Spark is of correct version (same as in Mahout's poms), is compiled and SPARK_HOME is set.
-
-(2) Check Mahout is compiled and MAHOUT_HOME is set.
-
-(3) run `$SPARK_HOME/bin/compute-classpath.sh` and make sure it produces sane result with no errors. 
+1. Check Spark is of correct version (same as in Mahout's poms), is compiled and SPARK_HOME is set.
+2. Check Mahout is compiled and MAHOUT_HOME is set.
+3. Run `$SPARK_HOME/bin/compute-classpath.sh` and make sure it produces sane result with no errors. 
 If it outputs something other than a straightforward classpath string, most likely Spark is not compiled/set correctly (later spark versions require 
 `sbt/sbt assembly` to be run, simply runnig `sbt/sbt publish-local` is not enough any longer).
+4. Run `$MAHOUT_HOME/bin/mahout -spark classpath` and check that path reported in step (3) is included.
 
-(4) run `$MAHOUT_HOME/bin/mahout -spark classpath` and check that path reported in step (3) is included.
+## Q: I am using the command line Mahout jobs that run on Spark or am writing my own application that uses Mahout's Spark code. 
+I have checked the classpaths as described above, but when I run it on my cluster I get 
+ClassNotFound or signature errors during serialization. What's wrong?
+ 
+A: The Spark artifacts in the maven ecosystem may not match the exact binary you are running on your cluster. This may 
+cause class name or version mismatches. In this case you may wish 
+to build Spark yourself to guarantee that you are running exactly what you are building Mahout against. To do this follow these steps
+in order:
+
+1. Build Spark with maven, but **do not** use the "package" target as described on the Spark site. Build with the "clean install" target instead. 
+Something like: "mvn clean install -Dhadoop1.2.1" or whatever your particular build options are. This will put the jars for Spark
+in the local maven cache.
+2. Deploy **your** Spark build to your cluster and test it there.
+3. Build Mahout. This will cause maven to pull the jars for Spark from the local maven cache and may resolve missing 
+or mis-identified classes.
+4. if you are building your own code do so against the local builds of Spark and Mahout.
 
+