You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Eric Beabes <ma...@gmail.com> on 2020/09/02 20:58:12 UTC

Submitting Spark Job thru REST API?

Under Spark 2.4 is it possible to submit a Spark job thru REST API - just
like the Flink job?

Here's the use case: We need to submit a Spark Job to the EMR cluster but
our security team is not allowing us to submit a job from the Master node
or thru UI. They want us to create a "Docker Container" to submit a job.

If it's possible to submit the Spark job thru REST then we don't need to
install Spark/Hadoop JARs on the Container. If it's not possible to use
REST API, can we do something like this?

spark-2.4.6-bin-hadoop2.7/bin/spark-submit \
 --class myclass --master "yarn url" --deploy-mode cluster \

In other words, instead of --master yarn, specify a URL. Would this still
work the same way?

Re: Submitting Spark Job thru REST API?

Posted by Breno Arosa <br...@edumobi.com.br>.
Maybe there are other ways but I think the most common path is using 
Apache Livy (https://livy.apache.org/).

On 02/09/2020 17:58, Eric Beabes wrote:
> Under Spark 2.4 is it possible to submit a Spark job thru REST API - 
> just like the Flink job?
>
> Here's the use case: We need to submit a Spark Job to the EMR cluster 
> but our security team is not allowing us to submit a job from the 
> Master node or thru UI. They want us to create a "Docker Container" to 
> submit a job.
>
> If it's possible to submit the Spark job thru REST then we don't need 
> to install Spark/Hadoop JARs on the Container. If it's not possible to 
> use REST API, can we do something like this?
>
> spark-2.4.6-bin-hadoop2.7/bin/spark-submit  \
>   --class myclass --master "yarn url" --deploy-mode cluster \
>   
> In other words, instead of --master yarn, specify a URL. Would this 
> still work the same way?


Re: Submitting Spark Job thru REST API?

Posted by Amit Joshi <ma...@gmail.com>.
Hi,
There is other option like apache Livy which lets you submit the job using
Rest api.
Other option can be using AWS Datapipeline to configure your job as EMR
activity.
To activate pipeline, you need console or a program.

Regards
Amit

On Thursday, September 3, 2020, Eric Beabes <ma...@gmail.com>
wrote:

> Under Spark 2.4 is it possible to submit a Spark job thru REST API - just
> like the Flink job?
>
> Here's the use case: We need to submit a Spark Job to the EMR cluster but
> our security team is not allowing us to submit a job from the Master node
> or thru UI. They want us to create a "Docker Container" to submit a job.
>
> If it's possible to submit the Spark job thru REST then we don't need to
> install Spark/Hadoop JARs on the Container. If it's not possible to use
> REST API, can we do something like this?
>
> spark-2.4.6-bin-hadoop2.7/bin/spark-submit \
>  --class myclass --master "yarn url" --deploy-mode cluster \
>
> In other words, instead of --master yarn, specify a URL. Would this still
> work the same way?
>

Re: Submitting Spark Job thru REST API?

Posted by Eric Beabes <ma...@gmail.com>.
Livy is working fairly well for submitting a job. One Question... At
present I am using it like this:

curl -H 'Content-Type: application/json' http://$LIVY_URL/batches -X POST -d"{
  'name' : '$JOB_NAME',
  'className' :  '$CLASS_NAME',
  'conf': {'spark.yarn.app.container.log.dir': '$LOG_DIR'},
  'conf': {'spark.executor.heartbeatInterval': $HEART_BEAT_INTERVAL},
  'conf': {'spark.driver.memoryOverhead': $MEMORY_OVERHEAD},
  'file'  : '$FILE_PATH',
  'proxyUser' : 'livy',
  'driverMemory' : '$DRIVER_MEMORY',
  'driverCores' : $DRIVER_CORES,
  'args' : $ARGS
}"


This is working well except the file has to be uploaded to S3 or HDFS
prior to running this command.

Is there a way to upload the JAR file prior to running this? Get the
Id of this file & then submit the Spark job. Kinda like how Flink does
it.

I realize this is an Apache Livy question so I will also ask on their
mailing list. Thanks.



On Thu, Sep 3, 2020 at 11:47 AM Eric Beabes <ma...@gmail.com>
wrote:

> Thank you all for your responses. Will try them out.
>
> On Thu, Sep 3, 2020 at 12:06 AM tianlangstudio <ti...@aliyun.com>
> wrote:
>
>> Hello, Eric
>> Maybe you can use  Spark JobServer 0.10.0
>> https://github.com/spark-jobserver/spark-jobserverl
>> We used this with Spark 1.6, and it is awesome. You know
>> the project is still very active. So highly recommend it to you
>>
>>    <https://www.upwork.com/fl/huanqingzhu>
>> <https://www.tianlang.tech/>Fusion Zhu <https://www.tianlang.tech/>
>>
>> ------------------------------------------------------------------
>> 发件人:Eric Beabes <ma...@gmail.com>
>> 发送时间:2020年9月3日(星期四) 04:58
>> 收件人:spark-user <us...@spark.apache.org>
>> 主 题:Submitting Spark Job thru REST API?
>>
>> Under Spark 2.4 is it possible to submit a Spark job thru REST API - just
>> like the Flink job?
>>
>> Here's the use case: We need to submit a Spark Job to the EMR cluster but
>> our security team is not allowing us to submit a job from the Master node
>> or thru UI. They want us to create a "Docker Container" to submit a job.
>>
>> If it's possible to submit the Spark job thru REST then we don't need to
>> install Spark/Hadoop JARs on the Container. If it's not possible to use
>> REST API, can we do something like this?
>>
>> spark-2.4.6-bin-hadoop2.7/bin/spark-submit \
>>  --class myclass --master "yarn url" --deploy-mode cluster \
>>
>> In other words, instead of --master yarn, specify a URL. Would this still
>> work the same way?
>>
>>

Re: Submitting Spark Job thru REST API?

Posted by Eric Beabes <ma...@gmail.com>.
Thank you all for your responses. Will try them out.

On Thu, Sep 3, 2020 at 12:06 AM tianlangstudio <ti...@aliyun.com>
wrote:

> Hello, Eric
> Maybe you can use  Spark JobServer 0.10.0
> https://github.com/spark-jobserver/spark-jobserverl
> We used this with Spark 1.6, and it is awesome. You know
> the project is still very active. So highly recommend it to you
>
>    <https://www.upwork.com/fl/huanqingzhu>
> <https://www.tianlang.tech/>Fusion Zhu <https://www.tianlang.tech/>
>
> ------------------------------------------------------------------
> 发件人:Eric Beabes <ma...@gmail.com>
> 发送时间:2020年9月3日(星期四) 04:58
> 收件人:spark-user <us...@spark.apache.org>
> 主 题:Submitting Spark Job thru REST API?
>
> Under Spark 2.4 is it possible to submit a Spark job thru REST API - just
> like the Flink job?
>
> Here's the use case: We need to submit a Spark Job to the EMR cluster but
> our security team is not allowing us to submit a job from the Master node
> or thru UI. They want us to create a "Docker Container" to submit a job.
>
> If it's possible to submit the Spark job thru REST then we don't need to
> install Spark/Hadoop JARs on the Container. If it's not possible to use
> REST API, can we do something like this?
>
> spark-2.4.6-bin-hadoop2.7/bin/spark-submit \
>  --class myclass --master "yarn url" --deploy-mode cluster \
>
> In other words, instead of --master yarn, specify a URL. Would this still
> work the same way?
>
>

回复:Submitting Spark Job thru REST API?

Posted by tianlangstudio <ti...@aliyun.com.INVALID>.
Hello, Eric
Maybe you can use  Spark JobServer 0.10.0 https://github.com/spark-jobserver/spark-jobserverl 
We used this with Spark 1.6, and it is awesome. You know the project is still very active. So highly recommend it to you


 
Fusion Zhu


------------------------------------------------------------------
发件人:Eric Beabes <ma...@gmail.com>
发送时间:2020年9月3日(星期四) 04:58
收件人:spark-user <us...@spark.apache.org>
主 题:Submitting Spark Job thru REST API?

Under Spark 2.4 is it possible to submit a Spark job thru REST API - just like the Flink job?

Here's the use case: We need to submit a Spark Job to the EMR cluster but our security team is not allowing us to submit a job from the Master node or thru UI. They want us to create a "Docker Container" to submit a job.

If it's possible to submit the Spark job thru REST then we don't need to install Spark/Hadoop JARs on the Container. If it's not possible to use REST API, can we do something like this?

spark-2.4.6-bin-hadoop2.7/bin/spark-submit \
 --class myclass --master "yarn url" --deploy-mode cluster \
 
In other words, instead of --master yarn, specify a URL. Would this still work the same way?