You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by yael aharon <ya...@gmail.com> on 2016/02/12 20:00:08 UTC

Allowing parallelism in spark local mode

Hello,
I have an application that receives requests over HTTP and uses spark in
local mode to process the requests. Each request is running in its own
thread.
It seems that spark is queueing the jobs, processing them one at a time.
When 2 requests arrive simultaneously, the processing time for each of them
is almost doubled.
I tried setting spark.default.parallelism, spark.executor.cores,
spark.driver.cores but that did not change the time in a meaningful way.

Am I missing something obvious?
thanks, Yael

Re: Allowing parallelism in spark local mode

Posted by Chris Fregly <ch...@fregly.com>.

sounds like the first job is occupying all resources.  you should limit the
resources that a single job can acquire.

fair scheduler is one way to do that.

a possibly simpler way is to configured spark.deploy.defaultCores or
spark.cores.max?

the defaults for these values - for the Spark default cluster resource
manager (aka Spark Standalone) - is infinite.  every job will try to
acquire every resource.

https://spark.apache.org/docs/latest/spark-standalone.html

here's an example config that i use for my reference data pipeline project:

https://github.com/fluxcapacitor/pipeline/blob/master/config/spark/spark-defaults.conf

i'm always playing with these values to simulate different conditions, but
that's the current snapshot that might be helpful.

also, don't forget about executor memory...


On Fri, Feb 12, 2016 at 1:40 PM, Silvio Fiorito <
silvio.fiorito@granturing.com> wrote:

> You’ll want to setup the FAIR scheduler as described here:
> https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application
>
> From: yael aharon <ya...@gmail.com>
> Date: Friday, February 12, 2016 at 2:00 PM
> To: "user@spark.apache.org" <us...@spark.apache.org>
> Subject: Allowing parallelism in spark local mode
>
> Hello,
> I have an application that receives requests over HTTP and uses spark in
> local mode to process the requests. Each request is running in its own
> thread.
> It seems that spark is queueing the jobs, processing them one at a time.
> When 2 requests arrive simultaneously, the processing time for each of them
> is almost doubled.
> I tried setting spark.default.parallelism, spark.executor.cores,
> spark.driver.cores but that did not change the time in a meaningful way.
>
> Am I missing something obvious?
> thanks, Yael
>
>


-- 

*Chris Fregly*
Principal Data Solutions Engineer
IBM Spark Technology Center, San Francisco, CA
http://spark.tc | http://advancedspark.com

Re: Allowing parallelism in spark local mode

Posted by Silvio Fiorito <si...@granturing.com>.

You’ll want to setup the FAIR scheduler as described here: https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application

From: yael aharon <ya...@gmail.com>>
Date: Friday, February 12, 2016 at 2:00 PM
To: "user@spark.apache.org<ma...@spark.apache.org>" <us...@spark.apache.org>>
Subject: Allowing parallelism in spark local mode

Hello,
I have an application that receives requests over HTTP and uses spark in local mode to process the requests. Each request is running in its own thread.
It seems that spark is queueing the jobs, processing them one at a time. When 2 requests arrive simultaneously, the processing time for each of them is almost doubled.
I tried setting spark.default.parallelism, spark.executor.cores, spark.driver.cores but that did not change the time in a meaningful way.

Am I missing something obvious?
thanks, Yael