You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@mahout.apache.org by "Tariq Jawed (Jira)" <ji...@apache.org> on 2020/03/22 16:40:00 UTC

[jira] [Created] (MAHOUT-2099) Using Mahout as a Library in Spark Cluster

Tariq Jawed created MAHOUT-2099:
-----------------------------------

             Summary: Using Mahout as a Library in Spark Cluster
                 Key: MAHOUT-2099
                 URL: https://issues.apache.org/jira/browse/MAHOUT-2099
             Project: Mahout
          Issue Type: Question
          Components: cooccurrence, Math
         Environment: Spark version 2.3.0.2.6.5.10-2

 

<dependency>
 <groupId>org.apache.mahout</groupId>
 <artifactId>mahout-math</artifactId>
 <version>0.13.0</version>
</dependency>
<dependency>
 <groupId>org.apache.mahout</groupId>
 <artifactId>mahout-math-scala_2.10</artifactId>
 <version>0.13.0</version>
</dependency>
<dependency>
 <groupId>org.apache.mahout</groupId>
 <artifactId>mahout-spark_2.10</artifactId>
 <version>0.13.0</version>
</dependency>
<dependency>
 <groupId>com.esotericsoftware</groupId>
 <artifactId>kryo</artifactId>
 <version>5.0.0-RC5</version>
</dependency>
            Reporter: Tariq Jawed


I have a Spark Cluster already setup, and this is the environment not in my direct control, but they do allow FAT JARs to be installed with the dependencies. I tried to package my Spark Application with some mahout code for SimilarityAnalysis, added Mahout library in POM file, and they are successfully packaged.

The problem however is that I am getting this error while using existing Spark Context to build Distributed Spark Context for Mahout

Code:

implicit val sc: SparkContext = sparkSession.sparkContext

implicit val msc: SparkDistributedContext = sc2sdc(sc)

Error:

ERROR TaskSetManager: Task 7.0 in stage 10.0 (TID 58) had a not serializable result: org.apache.mahout.math.DenseVector

 

And if I try to build the context using mahoutSparkContext() then its giving me the error that MAHOUT_HOME not found. 

Code:

implicit val msc = mahoutSparkContext(masterUrl = "local", appName = "CooccurrenceDriver")

Error:

MAHOUT_HOME is required to spawn mahout-based spark jobs

 

My question is that how do I proceed in this situation? should I have to ask the administrators of the Spark environment to install Mahout library, or is there anyway I can proceed packaging my application as fat JAR. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)