You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by John Zhao <jz...@alpinenow.com> on 2014/01/15 19:25:09 UTC
Anyone know hot to submit spark job to yarn in java code?
Now I am working on a web application and I want to submit a spark job to hadoop yarn.
I have already do my own assemble and can run it in command line by the following script:
export YARN_CONF_DIR=/home/gpadmin/clusterConfDir/yarn
export SPARK_JAR=./assembly/target/scala-2.9.3/spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar
./spark-class org.apache.spark.deploy.yarn.Client --jar ./examples/target/scala-2.9.3/spark-examples-assembly-0.8.1-incubating.jar --class org.apache.spark.examples.SparkPi --args yarn-standalone --num-workers 3 --master-memory 1g --worker-memory 512m --worker-cores 1
It works fine.
The I realized that it is hard to submit the job from a web application .Looks like the spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar or spark-examples-assembly-0.8.1-incubating.jar is a really big jar. I believe it contains everything .
So my question is :
1) when I run the above script, which jar is beed submitted to the yarn server ?
2) It loos like the spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar plays the role of client side and spark-examples-assembly-0.8.1-incubating.jar goes with spark runtime and examples which will be running in yarn, am I right?
3) Does anyone have any similar experience ? I did lots of hadoop MR stuff and want follow the same logic to submit spark job. For now I can only find the command line way to submit spark job to yarn. I believe there is a easy way to integration spark in a web allocation.
Thanks.
John.
Re: Anyone know hot to submit spark job to yarn in java code?
Posted by Philip Ogren <ph...@oracle.com>.
My problem seems to be related to this:
https://issues.apache.org/jira/browse/MAPREDUCE-4052
So, I will try running my setup from a Linux client and see if I have
better luck.
On 1/15/2014 11:38 AM, Philip Ogren wrote:
> Great question! I was writing up a similar question this morning and
> decided to investigate some more before sending. Here's what I'm
> trying. I have created a new scala project that contains only
> spark-examples-assembly-0.8.1-incubating.jar and
> spark-assembly-0.8.1-incubating-hadoop2.2.0-cdh5.0.0-beta-1.jar on the
> classpath and I am trying to create a yarn-client SparkContext with
> the following:
>
> val spark = new SparkContext("yarn-client", "my-app")
>
> My hope is to run this on my laptop and have it execute/connect on the
> yarn application master. The hope is that if I can get this to work,
> then I can do the same from a web application. I'm trying to unpack
> run-example.sh, compute-classpath, SparkPi, *.yarn.Client to figure
> out what environment variables I need to set up etc.
>
> I grabbed all the .xml files out of my clusters conf directory (in my
> case /etc/hadoop/conf.cloudera.yarn) such as e.g. yarn-site.xml and
> put them on my classpath. I also set up environment variables
> SPARK_JAR, SPARK_YARN_APP_JAR, SPARK_YARN_USER_ENV, SPARK_HOME.
>
> When I run my simple scala script, I get the following error:
>
> Exception in thread "main" org.apache.spark.SparkException: Yarn
> application already ended,might be killed or not able to launch
> application master.
> at
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApp(YarnClientSchedulerBackend.scala:95)
> at
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:72)
> at
> org.apache.spark.scheduler.cluster.ClusterScheduler.start(ClusterScheduler.scala:119)
> at org.apache.spark.SparkContext.<init>(SparkContext.scala:273)
> at
> SparkYarnClientExperiment$.main(SparkYarnClientExperiment.scala:14)
> at SparkYarnClientExperiment.main(SparkYarnClientExperiment.scala)
>
> I can look at my yarn UI and see that it registers a failed
> application, so I take this as incremental progress. However, I'm not
> sure how to troubleshoot what I'm doing from here or if what I'm
> trying to do is even sensible/possible. Any advice is appreciated.
>
> Thanks,
> Philip
>
> On 1/15/2014 11:25 AM, John Zhao wrote:
>> Now I am working on a web application and I want to submit a spark
>> job to hadoop yarn.
>> I have already do my own assemble and can run it in command line by
>> the following script:
>>
>> export YARN_CONF_DIR=/home/gpadmin/clusterConfDir/yarn
>> export
>> SPARK_JAR=./assembly/target/scala-2.9.3/spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar
>> ./spark-class org.apache.spark.deploy.yarn.Client --jar
>> ./examples/target/scala-2.9.3/spark-examples-assembly-0.8.1-incubating.jar
>> --class org.apache.spark.examples.SparkPi --args yarn-standalone
>> --num-workers 3 --master-memory 1g --worker-memory 512m --worker-cores 1
>>
>> It works fine.
>> The I realized that it is hard to submit the job from a web
>> application .Looks like the
>> spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar or
>> spark-examples-assembly-0.8.1-incubating.jar is a really big jar. I
>> believe it contains everything .
>> So my question is :
>> 1) when I run the above script, which jar is beed submitted to the
>> yarn server ?
>> 2) It loos like the
>> spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar plays the role
>> of client side and spark-examples-assembly-0.8.1-incubating.jar goes
>> with spark runtime and examples which will be running in yarn, am I
>> right?
>> 3) Does anyone have any similar experience ? I did lots of hadoop MR
>> stuff and want follow the same logic to submit spark job. For now I
>> can only find the command line way to submit spark job to yarn. I
>> believe there is a easy way to integration spark in a web allocation.
>>
>>
>> Thanks.
>> John.
>
Re: Anyone know hot to submit spark job to yarn in java code?
Posted by Philip Ogren <ph...@oracle.com>.
Great question! I was writing up a similar question this morning and
decided to investigate some more before sending. Here's what I'm
trying. I have created a new scala project that contains only
spark-examples-assembly-0.8.1-incubating.jar and
spark-assembly-0.8.1-incubating-hadoop2.2.0-cdh5.0.0-beta-1.jar on the
classpath and I am trying to create a yarn-client SparkContext with the
following:
val spark = new SparkContext("yarn-client", "my-app")
My hope is to run this on my laptop and have it execute/connect on the
yarn application master. The hope is that if I can get this to work,
then I can do the same from a web application. I'm trying to unpack
run-example.sh, compute-classpath, SparkPi, *.yarn.Client to figure out
what environment variables I need to set up etc.
I grabbed all the .xml files out of my clusters conf directory (in my
case /etc/hadoop/conf.cloudera.yarn) such as e.g. yarn-site.xml and put
them on my classpath. I also set up environment variables SPARK_JAR,
SPARK_YARN_APP_JAR, SPARK_YARN_USER_ENV, SPARK_HOME.
When I run my simple scala script, I get the following error:
Exception in thread "main" org.apache.spark.SparkException: Yarn
application already ended,might be killed or not able to launch
application master.
at
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApp(YarnClientSchedulerBackend.scala:95)
at
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:72)
at
org.apache.spark.scheduler.cluster.ClusterScheduler.start(ClusterScheduler.scala:119)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:273)
at SparkYarnClientExperiment$.main(SparkYarnClientExperiment.scala:14)
at SparkYarnClientExperiment.main(SparkYarnClientExperiment.scala)
I can look at my yarn UI and see that it registers a failed application,
so I take this as incremental progress. However, I'm not sure how to
troubleshoot what I'm doing from here or if what I'm trying to do is
even sensible/possible. Any advice is appreciated.
Thanks,
Philip
On 1/15/2014 11:25 AM, John Zhao wrote:
> Now I am working on a web application and I want to submit a spark job to hadoop yarn.
> I have already do my own assemble and can run it in command line by the following script:
>
> export YARN_CONF_DIR=/home/gpadmin/clusterConfDir/yarn
> export SPARK_JAR=./assembly/target/scala-2.9.3/spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar
> ./spark-class org.apache.spark.deploy.yarn.Client --jar ./examples/target/scala-2.9.3/spark-examples-assembly-0.8.1-incubating.jar --class org.apache.spark.examples.SparkPi --args yarn-standalone --num-workers 3 --master-memory 1g --worker-memory 512m --worker-cores 1
>
> It works fine.
> The I realized that it is hard to submit the job from a web application .Looks like the spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar or spark-examples-assembly-0.8.1-incubating.jar is a really big jar. I believe it contains everything .
> So my question is :
> 1) when I run the above script, which jar is beed submitted to the yarn server ?
> 2) It loos like the spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar plays the role of client side and spark-examples-assembly-0.8.1-incubating.jar goes with spark runtime and examples which will be running in yarn, am I right?
> 3) Does anyone have any similar experience ? I did lots of hadoop MR stuff and want follow the same logic to submit spark job. For now I can only find the command line way to submit spark job to yarn. I believe there is a easy way to integration spark in a web allocation.
>
>
> Thanks.
> John.
Re: Anyone know hot to submit spark job to yarn in java code?
Posted by Archit Thakur <ar...@gmail.com>.
Hi,
I am facing the same problem.
Did you find any solution or work around?
Thanks and Regards,
Archit Thakur.
On Thu, Jan 16, 2014 at 6:22 AM, Liu, Raymond <ra...@intel.com> wrote:
> Hi
>
> Regarding your question
>
> 1) when I run the above script, which jar is beed submitted to the yarn
> server ?
>
> What SPARK_JAR env point to and the --jar point to are both submitted to
> the yarn server
>
> 2) It like the spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar
> plays the role of client side and
> spark-examples-assembly-0.8.1-incubating.jar goes with spark runtime and
> examples which will be running in yarn, am I right?
>
> The spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar will also go to
> yarn cluster as runtime for app
> jar(spark-examples-assembly-0.8.1-incubating.jar)
>
> 3) Does anyone have any similar experience ? I did lots of hadoop MR stuff
> and want follow the same logic to submit spark job. For now I can only find
> the command line way to submit spark job to yarn. I believe there is a easy
> way to integration spark in a web allocation.
>
> You can use the yarn-client mode, you might want to take a look on docs/
> running-on-yarn.md, and probably you might want to try master branch to
> check our latest update on this part of docs. And in yarn client mode, the
> sparkcontext itself will do similar thing as what the command line is doing
> to submit a yarn job
>
> Then to use it with java, you might want to try out JavaSparkContext
> instead of SparkContext, I don't personally run it with complicated
> applications. But a small example app did works.
>
>
> Best Regards,
> Raymond Liu
>
> -----Original Message-----
> From: John Zhao [mailto:jzhao@alpinenow.com]
> Sent: Thursday, January 16, 2014 2:25 AM
> To: user@spark.incubator.apache.org
> Subject: Anyone know hot to submit spark job to yarn in java code?
>
> Now I am working on a web application and I want to submit a spark job
> to hadoop yarn.
> I have already do my own assemble and can run it in command line by the
> following script:
>
> export YARN_CONF_DIR=/home/gpadmin/clusterConfDir/yarn
> export
> SPARK_JAR=./assembly/target/scala-2.9.3/spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar
> ./spark-class org.apache.spark.deploy.yarn.Client --jar
> ./examples/target/scala-2.9.3/spark-examples-assembly-0.8.1-incubating.jar
> --class org.apache.spark.examples.SparkPi --args yarn-standalone
> --num-workers 3 --master-memory 1g --worker-memory 512m --worker-cores 1
>
> It works fine.
> The I realized that it is hard to submit the job from a web application
> .Looks like the spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar or
> spark-examples-assembly-0.8.1-incubating.jar is a really big jar. I believe
> it contains everything .
> So my question is :
> 1) when I run the above script, which jar is beed submitted to the yarn
> server ?
> 2) It loos like the spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar
> plays the role of client side and
> spark-examples-assembly-0.8.1-incubating.jar goes with spark runtime and
> examples which will be running in yarn, am I right?
> 3) Does anyone have any similar experience ? I did lots of hadoop MR stuff
> and want follow the same logic to submit spark job. For now I can only find
> the command line way to submit spark job to yarn. I believe there is a easy
> way to integration spark in a web allocation.
>
>
> Thanks.
> John.
>
RE: Anyone know hot to submit spark job to yarn in java code?
Posted by "Liu, Raymond" <ra...@intel.com>.
Hi
Regarding your question
1) when I run the above script, which jar is beed submitted to the yarn server ?
What SPARK_JAR env point to and the --jar point to are both submitted to the yarn server
2) It like the spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar plays the role of client side and spark-examples-assembly-0.8.1-incubating.jar goes with spark runtime and examples which will be running in yarn, am I right?
The spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar will also go to yarn cluster as runtime for app jar(spark-examples-assembly-0.8.1-incubating.jar)
3) Does anyone have any similar experience ? I did lots of hadoop MR stuff and want follow the same logic to submit spark job. For now I can only find the command line way to submit spark job to yarn. I believe there is a easy way to integration spark in a web allocation.
You can use the yarn-client mode, you might want to take a look on docs/running-on-yarn.md, and probably you might want to try master branch to check our latest update on this part of docs. And in yarn client mode, the sparkcontext itself will do similar thing as what the command line is doing to submit a yarn job
Then to use it with java, you might want to try out JavaSparkContext instead of SparkContext, I don't personally run it with complicated applications. But a small example app did works.
Best Regards,
Raymond Liu
-----Original Message-----
From: John Zhao [mailto:jzhao@alpinenow.com]
Sent: Thursday, January 16, 2014 2:25 AM
To: user@spark.incubator.apache.org
Subject: Anyone know hot to submit spark job to yarn in java code?
Now I am working on a web application and I want to submit a spark job to hadoop yarn.
I have already do my own assemble and can run it in command line by the following script:
export YARN_CONF_DIR=/home/gpadmin/clusterConfDir/yarn
export SPARK_JAR=./assembly/target/scala-2.9.3/spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar
./spark-class org.apache.spark.deploy.yarn.Client --jar ./examples/target/scala-2.9.3/spark-examples-assembly-0.8.1-incubating.jar --class org.apache.spark.examples.SparkPi --args yarn-standalone --num-workers 3 --master-memory 1g --worker-memory 512m --worker-cores 1
It works fine.
The I realized that it is hard to submit the job from a web application .Looks like the spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar or spark-examples-assembly-0.8.1-incubating.jar is a really big jar. I believe it contains everything .
So my question is :
1) when I run the above script, which jar is beed submitted to the yarn server ?
2) It loos like the spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar plays the role of client side and spark-examples-assembly-0.8.1-incubating.jar goes with spark runtime and examples which will be running in yarn, am I right?
3) Does anyone have any similar experience ? I did lots of hadoop MR stuff and want follow the same logic to submit spark job. For now I can only find the command line way to submit spark job to yarn. I believe there is a easy way to integration spark in a web allocation.
Thanks.
John.