You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by 陈宇航 <yu...@foxmail.com> on 2015/10/22 08:43:19 UTC

Request for submitting Spark jobs in code purely, without jar

Hi developers, I've encountered some problem with Spark, and before opening an issue, I'd like to hear your thoughts.


Currently, if you want to submit a Spark job, you'll need to write the code, make a jar, and then submit it with spark-submit or org.apache.spark.launcher.SparkLauncher. 


But sometimes, the RDD operation chain is transferred dynamically in code, from SQL or even GUI. thus it seems either inconvenient or not possible to make a separated jar. Then I tried something like below:
val conf = new SparkConf().setAppName("Demo").setMaster("yarn-client")val sc = new SparkContext(conf)sc.textFile("README.md").flatMap(_.split(" ")).map((_, 1)).reduceByKey(_+_).foreach(println) // A simple word countWhen they are executed, a Spark job is submitted. However, there are some remaining problems:
1. It doesn't support all deploy modes, such as yarn-cluster.
2. With the "Only 1 SparkContext in 1 JVM" limit, I can not run this twice.
3. It runs within the same process with my code, no child process is created.



Thus, what I wish for is that the problems can be handle by Spark itself, and my request can be simply described as a "adding submit() method for SparkContext / StreamingContext / SQLContext". I hope if I added a line after the code above like this:
sc.submit()then Spark can handle all background submitting processing for me.

I already opened an issue before for this demand, but I couldn't make myself clear back then. So I wrote this email and try to talk to you guys. Please reply if you need further descriptions, and I'll open a issue for this if you understand my demand and believe that it's something worth doing.


Thanks a lot.


Yuhang Chen.

yuhang.chen@foxmail.com

Re: Request for submitting Spark jobs in code purely, without jar

Posted by Ali Tajeldin EDU <al...@gmail.com>.

The Spark job-server project may help (https://github.com/spark-jobserver/spark-jobserver).
--
Ali

On Oct 21, 2015, at 11:43 PM, 陈宇航 <yu...@foxmail.com> wrote:

> Hi developers, I've encountered some problem with Spark, and before opening an issue, I'd like to hear your thoughts.
> 
> Currently, if you want to submit a Spark job, you'll need to write the code, make a jar, and then submit it with spark-submit or org.apache.spark.launcher.SparkLauncher.
> 
> But sometimes, the RDD operation chain is transferred dynamically in code, from SQL or even GUI. thus it seems either inconvenient or not possible to make a separated jar. Then I tried something like below:
> val conf = new SparkConf().setAppName("Demo").setMaster("yarn-client")
> val sc = new SparkContext(conf)
> sc.textFile("README.md").flatMap(_.split(" ")).map((_, 1)).reduceByKey(_+_).foreach(println) // A simple word count
> When they are executed, a Spark job is submitted. However, there are some remaining problems:
> 1. It doesn't support all deploy modes, such as yarn-cluster.
> 2. With the "Only 1 SparkContext in 1 JVM" limit, I can not run this twice.
> 3. It runs within the same process with my code, no child process is created.
> 
> Thus, what I wish for is that the problems can be handle by Spark itself, and my request can be simply described as a "adding submit() method for SparkContext / StreamingContext / SQLContext". I hope if I added a line after the code above like this:
> sc.submit()
> then Spark can handle all background submitting processing for me.
> 
> I already opened an issue before for this demand, but I couldn't make myself clear back then. So I wrote this email and try to talk to you guys. Please reply if you need further descriptions, and I'll open a issue for this if you understand my demand and believe that it's something worth doing.
> 
> Thanks a lot.
> 
> Yuhang Chen.
> yuhang.chen@foxmail.com