You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Jim Carroll <ji...@gmail.com> on 2014/04/17 20:02:54 UTC

Continuously running non-streaming jobs

Is there a way to create continuously-running, or at least
continuously-loaded, jobs that can be 'invoked' rather than 'sent' to to
avoid the job creation overhead of a couple seconds?

I read through the following:
http://apache-spark-user-list.1001560.n3.nabble.com/Job-initialization-performance-of-Spark-standalone-mode-vs-YARN-td2016.html

Thanks.
Jim




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Continuously-running-non-streaming-jobs-tp4391.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Continuously running non-streaming jobs

Posted by Daniel Darabos <da...@lynxanalytics.com>.
I'm quite new myself (just subscribed to the mailing list today :)), but
this happens to be something we've had success with. So let me know if you
hit any problems with this sort of usage.


On Thu, Apr 17, 2014 at 9:11 PM, Jim Carroll <ji...@gmail.com> wrote:

> Daniel,
>
> I'm new to Spark but I thought that thread hinted at the right answer.
>
> Thanks,
> Jim
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Continuously-running-non-streaming-jobs-tp4391p4397.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Re: Continuously running non-streaming jobs

Posted by Jim Carroll <ji...@gmail.com>.
Daniel,

I'm new to Spark but I thought that thread hinted at the right answer. 

Thanks,
Jim




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Continuously-running-non-streaming-jobs-tp4391p4397.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Continuously running non-streaming jobs

Posted by Daniel Darabos <da...@lynxanalytics.com>.
The linked thread does a good job answering your question. You should
create a SparkContext at startup and re-use it for all of your queries. For
example we create a SparkContext in a web server at startup, and are then
able to use the Spark cluster for serving Ajax queries with latency of a
second or less. The executors keep running during this time, so there is
minimal overhead to starting a job.


On Thu, Apr 17, 2014 at 8:02 PM, Jim Carroll <ji...@gmail.com> wrote:

> Is there a way to create continuously-running, or at least
> continuously-loaded, jobs that can be 'invoked' rather than 'sent' to to
> avoid the job creation overhead of a couple seconds?
>
> I read through the following:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/Job-initialization-performance-of-Spark-standalone-mode-vs-YARN-td2016.html
>
> Thanks.
> Jim
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Continuously-running-non-streaming-jobs-tp4391.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>