You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by ryaminal <ta...@gmail.com> on 2014/12/11 23:01:51 UTC

Running spark-submit from a remote machine using a YARN application

We are trying to submit a Spark application from a Tomcat application running
our business logic. The Tomcat app lives in a seperate non-hadoop cluster.
We first were doing this by using the spark-yarn package to directly call
Client#runApp() but found that the API we were using in Spark is being made
private in future releases. 
 
Now our solution is to make a very simply YARN application which execustes
as its command "spark-submit --master yarn-cluster s3n://application/jar.jar
...". This seemed so simple and elegant, but it has some weird issues. We
get "NoClassDefFoundErrors". When we ssh to the box, run the same
spark-submit command it works, but doing this through YARN leads in the
NoClassDefFoundErrors mentioned.
 
Also, examining the environment and Java properties between the working and
broken, we find that they have a different java classpath. So weird...
 
Has anyone had this problem or know a solution? We would be happy to post
our very simple code for creating the YARN application.
 
Thanks!



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Running-spark-submit-from-a-remote-machine-using-a-YARN-application-tp20642.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Running spark-submit from a remote machine using a YARN application

Posted by Tobias Pfeiffer <tg...@preferred.jp>.
Hi,

On Fri, Dec 12, 2014 at 7:01 AM, ryaminal <ta...@gmail.com> wrote:
>
> Now our solution is to make a very simply YARN application which execustes
> as its command "spark-submit --master yarn-cluster
> s3n://application/jar.jar
> ...". This seemed so simple and elegant, but it has some weird issues. We
> get "NoClassDefFoundErrors". When we ssh to the box, run the same
> spark-submit command it works, but doing this through YARN leads in the
> NoClassDefFoundErrors mentioned.
>

I do something similar, I start Spark using spark-submit from a non-Spark
server application. Make sure that HADOOP_CONF_DIR is set correctly when
running spark-submit from your program so that the YARN configuration can
be found correctly.

Also, keep in mind that some parameters to spark-submit have a different
behavior when using yarn-cluster vs. local[*] master. For example, system
properties set using `--conf` will be available in your Spark application
only in local[*] mode, for YARN you need to wrap them with `--conf
"spark.executor.extraJavaOptions=..."`.

Tobias