You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by ishaaq <is...@gmail.com> on 2014/04/29 01:39:54 UTC

launching concurrent jobs programmatically

Hi all,

I have a central app that currently kicks of old-style Hadoop M/R jobs
either on-demand or via a scheduling mechanism.

My intention is to gradually port this app over to using a Spark standalone
cluster. The data will remain on HDFS.

Couple of questions:

1. Is there a way to get Spark jobs to load from jars that have been
pre-distributed to HDFS? I need to run these jobs programmatically from said
application.

2. Is SparkContext meant to be used in multi-threaded use-cases? i.e. can
multiple independent jobs run concurrently using the same SparkContext or
should I create a new one each time my app needs to run a job?

Thanks,
Ishaaq



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/launching-concurrent-jobs-programmatically-tp4990.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: launching concurrent jobs programmatically

Posted by ishaaq <is...@gmail.com>.

Very interesting.

One of spark's attractive features is being able to do stuff interactively
via spark-shell. Is something like that still available via Ooyala's job
server?

Or do you use the spark-shell independently of that? If the latter then how
do you manage custom jars for spark-shell? Our app has a number of jars that
I don't particularly want to have to upload each time I want to run a small
ad-hoc spark-shell session.

Thanks,
Ishaaq



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/launching-concurrent-jobs-programmatically-tp4990p5033.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: launching concurrent jobs programmatically

Posted by Patrick Wendell <pw...@gmail.com>.

In general, as Andrew points out, it's possible to submit jobs from
multiple threads and many Spark applications do this.

One thing to check out is the job server from Ooyala, this is an
application on top of Spark that has an automated submission API:
https://github.com/ooyala/spark-jobserver

You can also accomplish this by just having a separate service that submits
multiple jobs to a cluster where those jobs e.g. use different jars.

- Patrick


On Mon, Apr 28, 2014 at 4:44 PM, Andrew Ash <an...@andrewash.com> wrote:

> For the second question, you can submit multiple jobs through the same
> SparkContext via different threads and this is a supported way of
> interacting with Spark.
>
> From the documentation:
>
> Second, *within* each Spark application, multiple "jobs" (Spark actions)
> may be running concurrently if they were submitted by different threads.
> This is common if your application is serving requests over the network;
> for example, the Shark <http://shark.cs.berkeley.edu/>server works this
> way. Spark includes a fair scheduler<https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application> to
> schedule resources within each SparkContext.
>
>  https://spark.apache.org/docs/latest/job-scheduling.html
>
>
> On Tue, Apr 29, 2014 at 1:39 AM, ishaaq <is...@gmail.com> wrote:
>
>> Hi all,
>>
>> I have a central app that currently kicks of old-style Hadoop M/R jobs
>> either on-demand or via a scheduling mechanism.
>>
>> My intention is to gradually port this app over to using a Spark
>> standalone
>> cluster. The data will remain on HDFS.
>>
>> Couple of questions:
>>
>> 1. Is there a way to get Spark jobs to load from jars that have been
>> pre-distributed to HDFS? I need to run these jobs programmatically from
>> said
>> application.
>>
>> 2. Is SparkContext meant to be used in multi-threaded use-cases? i.e. can
>> multiple independent jobs run concurrently using the same SparkContext or
>> should I create a new one each time my app needs to run a job?
>>
>> Thanks,
>> Ishaaq
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/launching-concurrent-jobs-programmatically-tp4990.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>
>

Re: launching concurrent jobs programmatically

Posted by Andrew Ash <an...@andrewash.com>.

For the second question, you can submit multiple jobs through the same
SparkContext via different threads and this is a supported way of
interacting with Spark.

>From the documentation:

Second, *within* each Spark application, multiple “jobs” (Spark actions)
may be running concurrently if they were submitted by different threads.
This is common if your application is serving requests over the network;
for example, the Shark <http://shark.cs.berkeley.edu/>server works this
way. Spark includes a fair
scheduler<https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application>
to
schedule resources within each SparkContext.

 https://spark.apache.org/docs/latest/job-scheduling.html

On Tue, Apr 29, 2014 at 1:39 AM, ishaaq <is...@gmail.com> wrote:

> Hi all,
>
> I have a central app that currently kicks of old-style Hadoop M/R jobs
> either on-demand or via a scheduling mechanism.
>
> My intention is to gradually port this app over to using a Spark standalone
> cluster. The data will remain on HDFS.
>
> Couple of questions:
>
> 1. Is there a way to get Spark jobs to load from jars that have been
> pre-distributed to HDFS? I need to run these jobs programmatically from
> said
> application.
>
> 2. Is SparkContext meant to be used in multi-threaded use-cases? i.e. can
> multiple independent jobs run concurrently using the same SparkContext or
> should I create a new one each time my app needs to run a job?
>
> Thanks,
> Ishaaq
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/launching-concurrent-jobs-programmatically-tp4990.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>