You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by zhanif <zh...@gmail.com> on 2014/01/26 00:13:42 UTC

ClassNotFoundException with simple Spark job on cluster

Hi everyone,

I have recently encountered an issue with running a very simple Spark job on
a distributed cluster. I can run other, similar jobs on this cluster, but
for some reason, the job in question will simply not execute.

The error I get when executing the job is:

14/01/25 22:08:54 WARN cluster.ClusterTaskSetManager: Loss was due to
java.lang.ClassNotFoundException
java.lang.ClassNotFoundException: job$$anonfun$1
	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:264)
	at
org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:36)
	at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1593)
...

The error continues for some time. I can post the full context if that's
helpful. The code in the job that seems to cause this issue is: 

    val l = file.map(li => li.split(","))

When I remove this line, the job runs well. This would be less of a concern,
if the context of that line wasn't:

    val file = spark.textFile("hdfs://<BZ2_FILE>", 10)
    val l = file.map(li => li.split(","))
    l.saveAsTextFile("hdfs://<OUTPUT_FILE>")

The file I am reading is a simple CSV file. As mentioned before, a more
complex job using some Spray JSON objects works just fine (and in fact, is
able to be executed immediately before and after the simple job fails, on
the same cluster), but the simple job simply refuses to run if that
anonymous function is executed.

If it helps, both jobs are being compiled as 'fat JARs' using sbt-assembly.

Thanks for your assistance.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-with-simple-Spark-job-on-cluster-tp932.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: ClassNotFoundException with simple Spark job on cluster

Posted by Archit Thakur <ar...@gmail.com>.
The very first thing that comes to my mind after reading your problem is
you need to add your jar to the list in the 4th argument of SparkContext as
is told by @Nhan above. Let me know, if it doesn't resolve your problem.

Thanks and Regards,
Archit Thakur.


On Sun, Jan 26, 2014 at 8:33 AM, Nhan Vu Lam Chi <nh...@adatao.com>wrote:

> May be you need to add your code jar file in Spark Context constructor.
>
> new SparkContext(master, appName, [sparkHome], [jars])
>
> Refer to this for more infomation
> https://spark.incubator.apache.org/docs/0.7.3/scala-programming-guide.html
> Hope it help!
>
>
> On Sun, Jan 26, 2014 at 6:13 AM, zhanif <zh...@gmail.com> wrote:
>
>> Hi everyone,
>>
>> I have recently encountered an issue with running a very simple Spark job
>> on
>> a distributed cluster. I can run other, similar jobs on this cluster, but
>> for some reason, the job in question will simply not execute.
>>
>> The error I get when executing the job is:
>>
>> 14/01/25 22:08:54 WARN cluster.ClusterTaskSetManager: Loss was due to
>> java.lang.ClassNotFoundException
>> java.lang.ClassNotFoundException: job$$anonfun$1
>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
>>         at java.lang.Class.forName0(Native Method)
>>         at java.lang.Class.forName(Class.java:264)
>>         at
>>
>> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:36)
>>         at
>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1593)
>> ...
>>
>> The error continues for some time. I can post the full context if that's
>> helpful. The code in the job that seems to cause this issue is:
>>
>>     val l = file.map(li => li.split(","))
>>
>> When I remove this line, the job runs well. This would be less of a
>> concern,
>> if the context of that line wasn't:
>>
>>     val file = spark.textFile("hdfs://<BZ2_FILE>", 10)
>>     val l = file.map(li => li.split(","))
>>     l.saveAsTextFile("hdfs://<OUTPUT_FILE>")
>>
>> The file I am reading is a simple CSV file. As mentioned before, a more
>> complex job using some Spray JSON objects works just fine (and in fact, is
>> able to be executed immediately before and after the simple job fails, on
>> the same cluster), but the simple job simply refuses to run if that
>> anonymous function is executed.
>>
>> If it helps, both jobs are being compiled as 'fat JARs' using
>> sbt-assembly.
>>
>> Thanks for your assistance.
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-with-simple-Spark-job-on-cluster-tp932.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>
>

Re: ClassNotFoundException with simple Spark job on cluster

Posted by Nhan Vu Lam Chi <nh...@adatao.com>.
May be you need to add your code jar file in Spark Context constructor.

new SparkContext(master, appName, [sparkHome], [jars])

Refer to this for more infomation
https://spark.incubator.apache.org/docs/0.7.3/scala-programming-guide.html
Hope it help!


On Sun, Jan 26, 2014 at 6:13 AM, zhanif <zh...@gmail.com> wrote:

> Hi everyone,
>
> I have recently encountered an issue with running a very simple Spark job
> on
> a distributed cluster. I can run other, similar jobs on this cluster, but
> for some reason, the job in question will simply not execute.
>
> The error I get when executing the job is:
>
> 14/01/25 22:08:54 WARN cluster.ClusterTaskSetManager: Loss was due to
> java.lang.ClassNotFoundException
> java.lang.ClassNotFoundException: job$$anonfun$1
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
>         at java.lang.Class.forName0(Native Method)
>         at java.lang.Class.forName(Class.java:264)
>         at
>
> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:36)
>         at
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1593)
> ...
>
> The error continues for some time. I can post the full context if that's
> helpful. The code in the job that seems to cause this issue is:
>
>     val l = file.map(li => li.split(","))
>
> When I remove this line, the job runs well. This would be less of a
> concern,
> if the context of that line wasn't:
>
>     val file = spark.textFile("hdfs://<BZ2_FILE>", 10)
>     val l = file.map(li => li.split(","))
>     l.saveAsTextFile("hdfs://<OUTPUT_FILE>")
>
> The file I am reading is a simple CSV file. As mentioned before, a more
> complex job using some Spray JSON objects works just fine (and in fact, is
> able to be executed immediately before and after the simple job fails, on
> the same cluster), but the simple job simply refuses to run if that
> anonymous function is executed.
>
> If it helps, both jobs are being compiled as 'fat JARs' using sbt-assembly.
>
> Thanks for your assistance.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-with-simple-Spark-job-on-cluster-tp932.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>