You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@livy.apache.org by Shubham Gupta <y2...@gmail.com> on 2018/10/01 01:30:16 UTC
Use existing SparkSession in POST/batches request
I'm trying to use Livy to remotely submit several Spark *jobs*. Lets say I
want to perform following *spark-submit task remotely* (with all the
options as-such)
spark-submit \
--class com.company.drivers.JumboBatchPipelineDriver \
--conf spark.driver.cores=1 \
--conf spark.driver.memory=1g \
--conf spark.dynamicAllocation.enabled=true \
--conf spark.serializer='org.apache.spark.serializer.KryoSerializer' \
--conf "spark.executor.extraJavaOptions= -XX:+UseG1GC" \
--master yarn \
--deploy-mode cluster \
/home/hadoop/y2k-shubham/jars/jumbo-batch.jar \
\
--start=2012-12-21 \
--end=2012-12-21 \
--pipeline=db-importer \
--run-spiders
*NOTE: The options after the JAR (--start, --end etc.) are specific to
my Spark application. I'm using scopt <https://github.com/scopt/scopt> for
this*
------------------------------
-
I'm aware that I can supply all the various options in above
spark-submit command
using Livy POST/batches request
<https://livy.incubator.apache.org/docs/latest/rest-api.html#post-batches>
.
-
But since I have to make over 250 spark-submits remotely, I'd like to
exploit Livy's *session-management capabilities*; i.e., I want Livy to
create a SparkSession once and then use it for all my spark-submit
requests.
-
The POST/sessions request
<https://livy.incubator.apache.org/docs/latest/rest-api.html#post-sessions>
allows
me to specify quite a few options for instantiating a SparkSession remotely.
However, I see no *session argument* in POST/batches request
<https://livy.incubator.apache.org/docs/latest/rest-api.html#post-batches>
.
------------------------------
My questions are
1. How can I make use of the SparkSession that I created using
POST/sessions request for submitting my Spark job using POST/batches
request?
2. In case its not possible, why is that the case?
3. Any workarounds?
------------------------------
I've referred to following examples but they only demonstrate supplying (
python) *code* for Sparkjob within Livy's POST request
- pi_app
<https://github.com/apache/incubator-livy/blob/master/examples/src/main/python/pi_app.py>
- rssanders3/airflow-spark-operator-plugin
<https://github.com/rssanders3/airflow-spark-operator-plugin/blob/master/example_dags/livy_spark_operator_python_example.py>
- livy/examples <https://livy.incubator.apache.org/examples/>
------------------------------
Here's the link <https://stackoverflow.com/questions/51746286/> to my
original question on StackOverflow
*Shubham Gupta*
Software Engineer
zomato
Re: Use existing SparkSession in POST/batches request
Posted by Shubham Gupta <y2...@gmail.com>.
Correct me if I'm wrong, but won't Interactive Mode require me to rewrite
my application code into statements that would then be submitted within the
POST/sessions/{sessionId}/statements
<https://livy.incubator.apache.org/docs/latest/rest-api.html#post-sessionssessionidstatements>
request
<https://livy.incubator.apache.org/docs/latest/rest-api.html#post-sessionssessionidstatements>
as
code property?
The thing is that I don't want to take the application logic out of my JAR
file containing my Spark application, because I'll be using Livy's HTTP
Rest API to submit remote Spark jobs via Apache-Airflow.
*Shubham Gupta*
Software Engineer
zomato
On Mon, Oct 1, 2018 at 7:30 AM Jeff Zhang <zj...@gmail.com> wrote:
> BTW, zeppelin has integrated livy's interactive mode to run spark code.
> You may try this as well.
>
> https://zeppelin.apache.org/docs/0.8.0/interpreter/livy.html
>
>
>
> Jeff Zhang <zj...@gmail.com>于2018年10月1日周一 上午9:58写道:
>
>>
>> Have you tried the interactive mode ?
>>
>> Shubham Gupta <y2...@gmail.com>于2018年10月1日周一 上午9:30写道:
>>
>>> I'm trying to use Livy to remotely submit several Spark *jobs*. Lets
>>> say I want to perform following *spark-submit task remotely* (with all
>>> the options as-such)
>>>
>>> spark-submit \
>>> --class com.company.drivers.JumboBatchPipelineDriver \
>>> --conf spark.driver.cores=1 \
>>> --conf spark.driver.memory=1g \
>>> --conf spark.dynamicAllocation.enabled=true \
>>> --conf spark.serializer='org.apache.spark.serializer.KryoSerializer' \
>>> --conf "spark.executor.extraJavaOptions= -XX:+UseG1GC" \
>>> --master yarn \
>>> --deploy-mode cluster \
>>> /home/hadoop/y2k-shubham/jars/jumbo-batch.jar \
>>> \
>>> --start=2012-12-21 \
>>> --end=2012-12-21 \
>>> --pipeline=db-importer \
>>> --run-spiders
>>>
>>> *NOTE: The options after the JAR (--start, --end etc.) are specific to
>>> my Spark application. I'm using scopt <https://github.com/scopt/scopt> for
>>> this*
>>> ------------------------------
>>>
>>> -
>>>
>>> I'm aware that I can supply all the various options in above
>>> spark-submit command using Livy POST/batches request
>>> <https://livy.incubator.apache.org/docs/latest/rest-api.html#post-batches>
>>> .
>>> -
>>>
>>> But since I have to make over 250 spark-submits remotely, I'd like
>>> to exploit Livy's *session-management capabilities*; i.e., I want
>>> Livy to create a SparkSession once and then use it for all my
>>> spark-submit requests.
>>> -
>>>
>>> The POST/sessions request
>>> <https://livy.incubator.apache.org/docs/latest/rest-api.html#post-sessions> allows
>>> me to specify quite a few options for instantiating a SparkSession remotely.
>>> However, I see no *session argument* in POST/batches request
>>> <https://livy.incubator.apache.org/docs/latest/rest-api.html#post-batches>
>>> .
>>>
>>> ------------------------------
>>>
>>> My questions are
>>>
>>>
>>> 1. How can I make use of the SparkSession that I created using
>>> POST/sessions request for submitting my Spark job using POST/batches
>>> request?
>>> 2. In case its not possible, why is that the case?
>>> 3. Any workarounds?
>>>
>>> ------------------------------
>>>
>>> I've referred to following examples but they only demonstrate supplying (
>>> python) *code* for Sparkjob within Livy's POST request
>>>
>>> - pi_app
>>> <https://github.com/apache/incubator-livy/blob/master/examples/src/main/python/pi_app.py>
>>> - rssanders3/airflow-spark-operator-plugin
>>> <https://github.com/rssanders3/airflow-spark-operator-plugin/blob/master/example_dags/livy_spark_operator_python_example.py>
>>> - livy/examples <https://livy.incubator.apache.org/examples/>
>>>
>>> ------------------------------
>>>
>>> Here's the link <https://stackoverflow.com/questions/51746286/> to my
>>> original question on StackOverflow
>>>
>>> *Shubham Gupta*
>>> Software Engineer
>>> zomato
>>>
>>
Re: Use existing SparkSession in POST/batches request
Posted by Jeff Zhang <zj...@gmail.com>.
BTW, zeppelin has integrated livy's interactive mode to run spark code. You
may try this as well.
https://zeppelin.apache.org/docs/0.8.0/interpreter/livy.html
Jeff Zhang <zj...@gmail.com>于2018年10月1日周一 上午9:58写道:
>
> Have you tried the interactive mode ?
>
> Shubham Gupta <y2...@gmail.com>于2018年10月1日周一 上午9:30写道:
>
>> I'm trying to use Livy to remotely submit several Spark *jobs*. Lets say
>> I want to perform following *spark-submit task remotely* (with all the
>> options as-such)
>>
>> spark-submit \
>> --class com.company.drivers.JumboBatchPipelineDriver \
>> --conf spark.driver.cores=1 \
>> --conf spark.driver.memory=1g \
>> --conf spark.dynamicAllocation.enabled=true \
>> --conf spark.serializer='org.apache.spark.serializer.KryoSerializer' \
>> --conf "spark.executor.extraJavaOptions= -XX:+UseG1GC" \
>> --master yarn \
>> --deploy-mode cluster \
>> /home/hadoop/y2k-shubham/jars/jumbo-batch.jar \
>> \
>> --start=2012-12-21 \
>> --end=2012-12-21 \
>> --pipeline=db-importer \
>> --run-spiders
>>
>> *NOTE: The options after the JAR (--start, --end etc.) are specific to
>> my Spark application. I'm using scopt <https://github.com/scopt/scopt> for
>> this*
>> ------------------------------
>>
>> -
>>
>> I'm aware that I can supply all the various options in above
>> spark-submit command using Livy POST/batches request
>> <https://livy.incubator.apache.org/docs/latest/rest-api.html#post-batches>
>> .
>> -
>>
>> But since I have to make over 250 spark-submits remotely, I'd like to
>> exploit Livy's *session-management capabilities*; i.e., I want Livy to
>> create a SparkSession once and then use it for all my spark-submit
>> requests.
>> -
>>
>> The POST/sessions request
>> <https://livy.incubator.apache.org/docs/latest/rest-api.html#post-sessions> allows
>> me to specify quite a few options for instantiating a SparkSession remotely.
>> However, I see no *session argument* in POST/batches request
>> <https://livy.incubator.apache.org/docs/latest/rest-api.html#post-batches>
>> .
>>
>> ------------------------------
>>
>> My questions are
>>
>>
>> 1. How can I make use of the SparkSession that I created using
>> POST/sessions request for submitting my Spark job using POST/batches
>> request?
>> 2. In case its not possible, why is that the case?
>> 3. Any workarounds?
>>
>> ------------------------------
>>
>> I've referred to following examples but they only demonstrate supplying (
>> python) *code* for Sparkjob within Livy's POST request
>>
>> - pi_app
>> <https://github.com/apache/incubator-livy/blob/master/examples/src/main/python/pi_app.py>
>> - rssanders3/airflow-spark-operator-plugin
>> <https://github.com/rssanders3/airflow-spark-operator-plugin/blob/master/example_dags/livy_spark_operator_python_example.py>
>> - livy/examples <https://livy.incubator.apache.org/examples/>
>>
>> ------------------------------
>>
>> Here's the link <https://stackoverflow.com/questions/51746286/> to my
>> original question on StackOverflow
>>
>> *Shubham Gupta*
>> Software Engineer
>> zomato
>>
>
Re: Use existing SparkSession in POST/batches request
Posted by Jeff Zhang <zj...@gmail.com>.
Have you tried the interactive mode ?
Shubham Gupta <y2...@gmail.com>于2018年10月1日周一 上午9:30写道:
> I'm trying to use Livy to remotely submit several Spark *jobs*. Lets say
> I want to perform following *spark-submit task remotely* (with all the
> options as-such)
>
> spark-submit \
> --class com.company.drivers.JumboBatchPipelineDriver \
> --conf spark.driver.cores=1 \
> --conf spark.driver.memory=1g \
> --conf spark.dynamicAllocation.enabled=true \
> --conf spark.serializer='org.apache.spark.serializer.KryoSerializer' \
> --conf "spark.executor.extraJavaOptions= -XX:+UseG1GC" \
> --master yarn \
> --deploy-mode cluster \
> /home/hadoop/y2k-shubham/jars/jumbo-batch.jar \
> \
> --start=2012-12-21 \
> --end=2012-12-21 \
> --pipeline=db-importer \
> --run-spiders
>
> *NOTE: The options after the JAR (--start, --end etc.) are specific to
> my Spark application. I'm using scopt <https://github.com/scopt/scopt> for
> this*
> ------------------------------
>
> -
>
> I'm aware that I can supply all the various options in above
> spark-submit command using Livy POST/batches request
> <https://livy.incubator.apache.org/docs/latest/rest-api.html#post-batches>
> .
> -
>
> But since I have to make over 250 spark-submits remotely, I'd like to
> exploit Livy's *session-management capabilities*; i.e., I want Livy to
> create a SparkSession once and then use it for all my spark-submit
> requests.
> -
>
> The POST/sessions request
> <https://livy.incubator.apache.org/docs/latest/rest-api.html#post-sessions> allows
> me to specify quite a few options for instantiating a SparkSession remotely.
> However, I see no *session argument* in POST/batches request
> <https://livy.incubator.apache.org/docs/latest/rest-api.html#post-batches>
> .
>
> ------------------------------
>
> My questions are
>
>
> 1. How can I make use of the SparkSession that I created using
> POST/sessions request for submitting my Spark job using POST/batches
> request?
> 2. In case its not possible, why is that the case?
> 3. Any workarounds?
>
> ------------------------------
>
> I've referred to following examples but they only demonstrate supplying (
> python) *code* for Sparkjob within Livy's POST request
>
> - pi_app
> <https://github.com/apache/incubator-livy/blob/master/examples/src/main/python/pi_app.py>
> - rssanders3/airflow-spark-operator-plugin
> <https://github.com/rssanders3/airflow-spark-operator-plugin/blob/master/example_dags/livy_spark_operator_python_example.py>
> - livy/examples <https://livy.incubator.apache.org/examples/>
>
> ------------------------------
>
> Here's the link <https://stackoverflow.com/questions/51746286/> to my
> original question on StackOverflow
>
> *Shubham Gupta*
> Software Engineer
> zomato
>