You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Peter Aberline <pe...@gmail.com> on 2014/09/08 19:03:41 UTC

Spark-submit ClassNotFoundException with JAR!

Hi,

I'm having problems with a ClassNotFoundException using this simple example:


import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

import java.net.URLClassLoader

import scala.util.Marshal

class ClassToRoundTrip(val id: Int) extends scala.Serializable {
}

object RoundTripTester {

  def test(id : Int) : ClassToRoundTrip = {

    // Get the current classpath and output. Can we see simpleapp jar?
    val cl = ClassLoader.getSystemClassLoader
    val urls = cl.asInstanceOf[URLClassLoader].getURLs
    urls.foreach(url => println("Executor classpath is:" + url.getFile))

    // Simply instantiating an instance of object and using it works fine.
    val testObj = new ClassToRoundTrip(id)
    println("testObj.id: " + testObj.id)

    val testObjBytes = Marshal.dump(testObj)
    val testObjRoundTrip =
Marshal.load[ClassToRoundTrip](testObjBytes)  // <<--
ClassNotFoundException here
    testObjRoundTrip
  }
}

object SimpleApp {
  def main(args: Array[String]) {

    val conf = new SparkConf().setAppName("Simple Application")
    val sc = new SparkContext(conf)

    val cl = ClassLoader.getSystemClassLoader
    val urls = cl.asInstanceOf[URLClassLoader].getURLs
    urls.foreach(url => println("Driver classpath is: " + url.getFile))

    val data = Array(1, 2, 3, 4, 5)
    val distData = sc.parallelize(data)
    distData.foreach(x=> RoundTripTester.test(x))
  }
}

In local mode, submitting as per the docs generates a "ClassNotFound"
exception on line 31, where the ClassToRoundTrip object is
deserialized. Strangely, the earlier use on line 28 is okay:
spark-submit --class "SimpleApp" \
             --master local[4] \
             target/scala-2.10/simpleapp_2.10-1.0.jar


However, if I add extra parameters for "driver-class-path", and
"-jars", it works fine, on local.
spark-submit --class "SimpleApp" \
             --master local[4] \
             --driver-class-path
/home/xxxxxxx/workspace/SimpleApp/target/scala-2.10/simpleapp_2.10-1.0.jar
\
             --jars
/home/xxxxxxx/workspace/SimpleApp/target/scala-2.10/SimpleApp.jar \
             target/scala-2.10/simpleapp_2.10-1.0.jar

However, submitting to a local dev master, still generates the same issue:
spark-submit --class "SimpleApp" \
             --master spark://localhost.localdomain:7077 \
             --driver-class-path
/home/xxxxxxx/workspace/SimpleApp/target/scala-2.10/simpleapp_2.10-1.0.jar
\
             --jars
/home/xxxxxxx/workspace/SimpleApp/target/scala-2.10/simpleapp_2.10-1.0.jar
\
             target/scala-2.10/simpleapp_2.10-1.0.jar

I can see from the output that the JAR file is being fetched by the executor.

Logs for one of the executor's are here:

stdout: http://pastebin.com/raw.php?i=DQvvGhKm

stderr: http://pastebin.com/raw.php?i=MPZZVa0Q

I'm using Spark 1.0.2. The ClassToRoundTrip is included in the JAR.
I have a work around of copying the JAR to each of the machines and
setting the "spark.executor.extraClassPath" parameter but I would
rather not have to do that.

This is such a simple case, I must be doing something obviously wrong.
Can anyone help?


Thanks
Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org