You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by doruchiulan <do...@gmail.com> on 2016/10/11 19:10:39 UTC

Spark Docker Container - Jars problem when deploying my app

Hi,

I have a problem that's bothering me for a few days, and I'm pretty out of
ideas.

I built a Spark docker container where Spark runs in standalone mode. Both
master and worker are started there.

Now I tried to deploy my Spark Scala App in a separate container(same
machine) where I pass the Spark master URL and other stuff I need to connect
to Spark. Connection is seamless.

First problem I encountered was:

Exception in thread "main" org.apache.spark.SparkException: Job aborted due
to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure:
Lost task 0.3 in stage 0.0 (TID 3, 10.1.0.4):
java.lang.ClassNotFoundException:
org.apache.spark.streaming.kafka.KafkaRDDPartition

Then I made a folder of my dependencies except Spark, put them in a folder
alongside my app JAR file and added them to SparkConf using
SparkConf.setJars,

Now the strange thing happens:
Exception in thread "main" org.apache.spark.SparkException: Job aborted due
to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure:
Lost task 0.3 in stage 0.0 (TID 3, 10.1.0.4): java.lang.ClassCastException:
cannot assign instance of scala.collection.immutable.List$SerializationProxy
to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of
type scala.collection.Seq in instance of
org.apache.spark.rdd.MapPartitionsRDD

More than this, if I just run the scala app from local using java -cp
<dependencies(including spark jars) cp> myApp.jar it works perfectly, jobs
run ok.

I don't have any SPARK_HOME locally and the setJars basically takes an empty
list, as If I won't use it.

I guess it uses jars provided in the classpath when I run my app and I don't
need to provide anything else.

If any of you guys have any ideas I would be grateful, I really can't
explain myself why this doesn't work and I didn't do any Spark deployments
until now. I mostly ran in embedded Spark.

Spark is same version in my app dependencies (2.0.0) as the one running in
the docker container.
I used Scala 2.11.7 for my app
Java 1.8 on both containers(app, spark)



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Docker-Container-Jars-problem-when-deploying-my-app-tp27878.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Spark Docker Container - Jars problem when deploying my app

Posted by Denis Bolshakov <bo...@gmail.com>.
Try to build a flat (uber) jar which includes all dependencies.

11 Окт 2016 г. 22:11 пользователь "doruchiulan" <do...@gmail.com>
написал:

> Hi,
>
> I have a problem that's bothering me for a few days, and I'm pretty out of
> ideas.
>
> I built a Spark docker container where Spark runs in standalone mode. Both
> master and worker are started there.
>
> Now I tried to deploy my Spark Scala App in a separate container(same
> machine) where I pass the Spark master URL and other stuff I need to
> connect
> to Spark. Connection is seamless.
>
> First problem I encountered was:
>
> Exception in thread "main" org.apache.spark.SparkException: Job aborted
> due
> to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure:
> Lost task 0.3 in stage 0.0 (TID 3, 10.1.0.4):
> java.lang.ClassNotFoundException:
> org.apache.spark.streaming.kafka.KafkaRDDPartition
>
> Then I made a folder of my dependencies except Spark, put them in a folder
> alongside my app JAR file and added them to SparkConf using
> SparkConf.setJars,
>
> Now the strange thing happens:
> Exception in thread "main" org.apache.spark.SparkException: Job aborted
> due
> to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure:
> Lost task 0.3 in stage 0.0 (TID 3, 10.1.0.4): java.lang.ClassCastException:
> cannot assign instance of scala.collection.immutable.
> List$SerializationProxy
> to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_
> of
> type scala.collection.Seq in instance of
> org.apache.spark.rdd.MapPartitionsRDD
>
> More than this, if I just run the scala app from local using java -cp
> <dependencies(including spark jars) cp> myApp.jar it works perfectly, jobs
> run ok.
>
> I don't have any SPARK_HOME locally and the setJars basically takes an
> empty
> list, as If I won't use it.
>
> I guess it uses jars provided in the classpath when I run my app and I
> don't
> need to provide anything else.
>
> If any of you guys have any ideas I would be grateful, I really can't
> explain myself why this doesn't work and I didn't do any Spark deployments
> until now. I mostly ran in embedded Spark.
>
> Spark is same version in my app dependencies (2.0.0) as the one running in
> the docker container.
> I used Scala 2.11.7 for my app
> Java 1.8 on both containers(app, spark)
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Spark-Docker-Container-Jars-problem-
> when-deploying-my-app-tp27878.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>