You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by John Zhao <jz...@alpinenow.com> on 2014/01/15 19:25:09 UTC

Anyone know hot to submit spark job to yarn in java code?

Now I am working on a web application and  I want to  submit a spark job to hadoop yarn.
I have already do my own assemble and  can run it in command line by the following script:

export YARN_CONF_DIR=/home/gpadmin/clusterConfDir/yarn
export SPARK_JAR=./assembly/target/scala-2.9.3/spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar
./spark-class org.apache.spark.deploy.yarn.Client  --jar ./examples/target/scala-2.9.3/spark-examples-assembly-0.8.1-incubating.jar  --class org.apache.spark.examples.SparkPi --args yarn-standalone --num-workers 3 --master-memory 1g --worker-memory 512m --worker-cores 1    

It works fine.
The I realized that it is hard to submit the job from a web application .Looks like the spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar or spark-examples-assembly-0.8.1-incubating.jar is a really big jar. I believe it contains everything .
So my question is :
1) when I run the above script, which jar is beed submitted to the yarn server ? 
2) It loos like the  spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar plays the role of client side and spark-examples-assembly-0.8.1-incubating.jar goes with spark runtime and examples which will be running in yarn, am I right?
3) Does anyone have any similar experience ? I did lots of hadoop MR stuff and want follow the same logic to submit spark job. For now I can only find the command line way to submit spark job to yarn. I believe there is a easy way to integration spark in a web allocation.  


Thanks.
John.

Re: Anyone know hot to submit spark job to yarn in java code?

Posted by Philip Ogren <ph...@oracle.com>.
My problem seems to be related to this:
https://issues.apache.org/jira/browse/MAPREDUCE-4052

So, I will try running my setup from a Linux client and see if I have 
better luck.

On 1/15/2014 11:38 AM, Philip Ogren wrote:
> Great question!  I was writing up a similar question this morning and 
> decided to investigate some more before sending.  Here's what I'm 
> trying.  I have created a new scala project that contains only 
> spark-examples-assembly-0.8.1-incubating.jar and 
> spark-assembly-0.8.1-incubating-hadoop2.2.0-cdh5.0.0-beta-1.jar on the 
> classpath and I am trying to create a yarn-client SparkContext with 
> the following:
>
> val spark = new SparkContext("yarn-client", "my-app")
>
> My hope is to run this on my laptop and have it execute/connect on the 
> yarn application master.  The hope is that if I can get this to work, 
> then I can do the same from a web application.  I'm trying to unpack 
> run-example.sh, compute-classpath, SparkPi, *.yarn.Client to figure 
> out what environment variables I need to set up etc.
>
> I grabbed all the .xml files out of my clusters conf directory (in my 
> case /etc/hadoop/conf.cloudera.yarn) such as e.g. yarn-site.xml and 
> put them on my classpath.  I also set up environment variables 
> SPARK_JAR, SPARK_YARN_APP_JAR, SPARK_YARN_USER_ENV, SPARK_HOME.
>
> When I run my simple scala script, I get the following error:
>
> Exception in thread "main" org.apache.spark.SparkException: Yarn 
> application already ended,might be killed or not able to launch 
> application master.
>     at 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApp(YarnClientSchedulerBackend.scala:95)
>     at 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:72)
>     at 
> org.apache.spark.scheduler.cluster.ClusterScheduler.start(ClusterScheduler.scala:119)
>     at org.apache.spark.SparkContext.<init>(SparkContext.scala:273)
>     at 
> SparkYarnClientExperiment$.main(SparkYarnClientExperiment.scala:14)
>     at SparkYarnClientExperiment.main(SparkYarnClientExperiment.scala)
>
> I can look at my yarn UI and see that it registers a failed 
> application, so I take this as incremental progress.  However, I'm not 
> sure how to troubleshoot what I'm doing from here or if what I'm 
> trying to do is even sensible/possible.  Any advice is appreciated.
>
> Thanks,
> Philip
>
> On 1/15/2014 11:25 AM, John Zhao wrote:
>> Now I am working on a web application and I want to  submit a spark 
>> job to hadoop yarn.
>> I have already do my own assemble and  can run it in command line by 
>> the following script:
>>
>> export YARN_CONF_DIR=/home/gpadmin/clusterConfDir/yarn
>> export 
>> SPARK_JAR=./assembly/target/scala-2.9.3/spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar
>> ./spark-class org.apache.spark.deploy.yarn.Client  --jar 
>> ./examples/target/scala-2.9.3/spark-examples-assembly-0.8.1-incubating.jar 
>> --class org.apache.spark.examples.SparkPi --args yarn-standalone 
>> --num-workers 3 --master-memory 1g --worker-memory 512m --worker-cores 1
>>
>> It works fine.
>> The I realized that it is hard to submit the job from a web 
>> application .Looks like the 
>> spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar or 
>> spark-examples-assembly-0.8.1-incubating.jar is a really big jar. I 
>> believe it contains everything .
>> So my question is :
>> 1) when I run the above script, which jar is beed submitted to the 
>> yarn server ?
>> 2) It loos like the 
>> spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar plays the role 
>> of client side and spark-examples-assembly-0.8.1-incubating.jar goes 
>> with spark runtime and examples which will be running in yarn, am I 
>> right?
>> 3) Does anyone have any similar experience ? I did lots of hadoop MR 
>> stuff and want follow the same logic to submit spark job. For now I 
>> can only find the command line way to submit spark job to yarn. I 
>> believe there is a easy way to integration spark in a web allocation.
>>
>>
>> Thanks.
>> John.
>


Re: Anyone know hot to submit spark job to yarn in java code?

Posted by Philip Ogren <ph...@oracle.com>.
Great question!  I was writing up a similar question this morning and 
decided to investigate some more before sending.  Here's what I'm 
trying.  I have created a new scala project that contains only 
spark-examples-assembly-0.8.1-incubating.jar and 
spark-assembly-0.8.1-incubating-hadoop2.2.0-cdh5.0.0-beta-1.jar on the 
classpath and I am trying to create a yarn-client SparkContext with the 
following:

val spark = new SparkContext("yarn-client", "my-app")

My hope is to run this on my laptop and have it execute/connect on the 
yarn application master.  The hope is that if I can get this to work, 
then I can do the same from a web application.  I'm trying to unpack 
run-example.sh, compute-classpath, SparkPi, *.yarn.Client to figure out 
what environment variables I need to set up etc.

I grabbed all the .xml files out of my clusters conf directory (in my 
case /etc/hadoop/conf.cloudera.yarn) such as e.g. yarn-site.xml and put 
them on my classpath.  I also set up environment variables SPARK_JAR, 
SPARK_YARN_APP_JAR, SPARK_YARN_USER_ENV, SPARK_HOME.

When I run my simple scala script, I get the following error:

Exception in thread "main" org.apache.spark.SparkException: Yarn 
application already ended,might be killed or not able to launch 
application master.
     at 
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApp(YarnClientSchedulerBackend.scala:95)
     at 
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:72)
     at 
org.apache.spark.scheduler.cluster.ClusterScheduler.start(ClusterScheduler.scala:119)
     at org.apache.spark.SparkContext.<init>(SparkContext.scala:273)
     at SparkYarnClientExperiment$.main(SparkYarnClientExperiment.scala:14)
     at SparkYarnClientExperiment.main(SparkYarnClientExperiment.scala)

I can look at my yarn UI and see that it registers a failed application, 
so I take this as incremental progress.  However, I'm not sure how to 
troubleshoot what I'm doing from here or if what I'm trying to do is 
even sensible/possible.  Any advice is appreciated.

Thanks,
Philip

On 1/15/2014 11:25 AM, John Zhao wrote:
> Now I am working on a web application and  I want to  submit a spark job to hadoop yarn.
> I have already do my own assemble and  can run it in command line by the following script:
>
> export YARN_CONF_DIR=/home/gpadmin/clusterConfDir/yarn
> export SPARK_JAR=./assembly/target/scala-2.9.3/spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar
> ./spark-class org.apache.spark.deploy.yarn.Client  --jar ./examples/target/scala-2.9.3/spark-examples-assembly-0.8.1-incubating.jar  --class org.apache.spark.examples.SparkPi --args yarn-standalone --num-workers 3 --master-memory 1g --worker-memory 512m --worker-cores 1
>
> It works fine.
> The I realized that it is hard to submit the job from a web application .Looks like the spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar or spark-examples-assembly-0.8.1-incubating.jar is a really big jar. I believe it contains everything .
> So my question is :
> 1) when I run the above script, which jar is beed submitted to the yarn server ?
> 2) It loos like the  spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar plays the role of client side and spark-examples-assembly-0.8.1-incubating.jar goes with spark runtime and examples which will be running in yarn, am I right?
> 3) Does anyone have any similar experience ? I did lots of hadoop MR stuff and want follow the same logic to submit spark job. For now I can only find the command line way to submit spark job to yarn. I believe there is a easy way to integration spark in a web allocation.
>
>
> Thanks.
> John.


Re: Anyone know hot to submit spark job to yarn in java code?

Posted by Archit Thakur <ar...@gmail.com>.
Hi,

I am facing the same problem.
Did you find any solution or work around?

Thanks and Regards,
Archit Thakur.


On Thu, Jan 16, 2014 at 6:22 AM, Liu, Raymond <ra...@intel.com> wrote:

> Hi
>
> Regarding your question
>
> 1) when I run the above script, which jar is beed submitted to the yarn
> server ?
>
> What SPARK_JAR env point to and the --jar point to are both submitted to
> the yarn server
>
> 2) It like the  spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar
> plays the role of client side and
> spark-examples-assembly-0.8.1-incubating.jar goes with spark runtime and
> examples which will be running in yarn, am I right?
>
> The spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar will also go to
> yarn cluster as runtime for app
> jar(spark-examples-assembly-0.8.1-incubating.jar)
>
> 3) Does anyone have any similar experience ? I did lots of hadoop MR stuff
> and want follow the same logic to submit spark job. For now I can only find
> the command line way to submit spark job to yarn. I believe there is a easy
> way to integration spark in a web allocation.
>
> You can use the yarn-client mode, you might want to take a look on docs/
> running-on-yarn.md, and probably you might want to try master branch to
> check our latest update on this part of docs. And in yarn client mode, the
> sparkcontext itself will do similar thing as what the command line is doing
> to submit a yarn job
>
> Then to use it with java, you might want to try out JavaSparkContext
> instead of SparkContext, I don't personally run it with complicated
> applications. But a small example app did works.
>
>
> Best Regards,
> Raymond Liu
>
> -----Original Message-----
> From: John Zhao [mailto:jzhao@alpinenow.com]
> Sent: Thursday, January 16, 2014 2:25 AM
> To: user@spark.incubator.apache.org
> Subject: Anyone know hot to submit spark job to yarn in java code?
>
> Now I am working on a web application and  I want to  submit a spark job
> to hadoop yarn.
> I have already do my own assemble and  can run it in command line by the
> following script:
>
> export YARN_CONF_DIR=/home/gpadmin/clusterConfDir/yarn
> export
> SPARK_JAR=./assembly/target/scala-2.9.3/spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar
> ./spark-class org.apache.spark.deploy.yarn.Client  --jar
> ./examples/target/scala-2.9.3/spark-examples-assembly-0.8.1-incubating.jar
> --class org.apache.spark.examples.SparkPi --args yarn-standalone
> --num-workers 3 --master-memory 1g --worker-memory 512m --worker-cores 1
>
> It works fine.
> The I realized that it is hard to submit the job from a web application
> .Looks like the spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar or
> spark-examples-assembly-0.8.1-incubating.jar is a really big jar. I believe
> it contains everything .
> So my question is :
> 1) when I run the above script, which jar is beed submitted to the yarn
> server ?
> 2) It loos like the  spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar
> plays the role of client side and
> spark-examples-assembly-0.8.1-incubating.jar goes with spark runtime and
> examples which will be running in yarn, am I right?
> 3) Does anyone have any similar experience ? I did lots of hadoop MR stuff
> and want follow the same logic to submit spark job. For now I can only find
> the command line way to submit spark job to yarn. I believe there is a easy
> way to integration spark in a web allocation.
>
>
> Thanks.
> John.
>

RE: Anyone know hot to submit spark job to yarn in java code?

Posted by "Liu, Raymond" <ra...@intel.com>.
Hi

Regarding your question

1) when I run the above script, which jar is beed submitted to the yarn server ? 

What SPARK_JAR env point to and the --jar point to are both submitted to the yarn server

2) It like the  spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar plays the role of client side and spark-examples-assembly-0.8.1-incubating.jar goes with spark runtime and examples which will be running in yarn, am I right?

The spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar will also go to yarn cluster as runtime for app jar(spark-examples-assembly-0.8.1-incubating.jar)

3) Does anyone have any similar experience ? I did lots of hadoop MR stuff and want follow the same logic to submit spark job. For now I can only find the command line way to submit spark job to yarn. I believe there is a easy way to integration spark in a web allocation.

You can use the yarn-client mode, you might want to take a look on docs/running-on-yarn.md, and probably you might want to try master branch to check our latest update on this part of docs. And in yarn client mode, the sparkcontext itself will do similar thing as what the command line is doing to submit a yarn job

Then to use it with java, you might want to try out JavaSparkContext instead of SparkContext, I don't personally run it with complicated applications. But a small example app did works.
	

Best Regards,
Raymond Liu

-----Original Message-----
From: John Zhao [mailto:jzhao@alpinenow.com] 
Sent: Thursday, January 16, 2014 2:25 AM
To: user@spark.incubator.apache.org
Subject: Anyone know hot to submit spark job to yarn in java code?

Now I am working on a web application and  I want to  submit a spark job to hadoop yarn.
I have already do my own assemble and  can run it in command line by the following script:

export YARN_CONF_DIR=/home/gpadmin/clusterConfDir/yarn
export SPARK_JAR=./assembly/target/scala-2.9.3/spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar
./spark-class org.apache.spark.deploy.yarn.Client  --jar ./examples/target/scala-2.9.3/spark-examples-assembly-0.8.1-incubating.jar  --class org.apache.spark.examples.SparkPi --args yarn-standalone --num-workers 3 --master-memory 1g --worker-memory 512m --worker-cores 1    

It works fine.
The I realized that it is hard to submit the job from a web application .Looks like the spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar or spark-examples-assembly-0.8.1-incubating.jar is a really big jar. I believe it contains everything .
So my question is :
1) when I run the above script, which jar is beed submitted to the yarn server ? 
2) It loos like the  spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar plays the role of client side and spark-examples-assembly-0.8.1-incubating.jar goes with spark runtime and examples which will be running in yarn, am I right?
3) Does anyone have any similar experience ? I did lots of hadoop MR stuff and want follow the same logic to submit spark job. For now I can only find the command line way to submit spark job to yarn. I believe there is a easy way to integration spark in a web allocation.  


Thanks.
John.