You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by yael aharon <ya...@gmail.com> on 2016/02/12 20:00:08 UTC
Allowing parallelism in spark local mode
Hello,
I have an application that receives requests over HTTP and uses spark in
local mode to process the requests. Each request is running in its own
thread.
It seems that spark is queueing the jobs, processing them one at a time.
When 2 requests arrive simultaneously, the processing time for each of them
is almost doubled.
I tried setting spark.default.parallelism, spark.executor.cores,
spark.driver.cores but that did not change the time in a meaningful way.
Am I missing something obvious?
thanks, Yael
Re: Allowing parallelism in spark local mode
Posted by Chris Fregly <ch...@fregly.com>.
sounds like the first job is occupying all resources. you should limit the
resources that a single job can acquire.
fair scheduler is one way to do that.
a possibly simpler way is to configured spark.deploy.defaultCores or
spark.cores.max?
the defaults for these values - for the Spark default cluster resource
manager (aka Spark Standalone) - is infinite. every job will try to
acquire every resource.
https://spark.apache.org/docs/latest/spark-standalone.html
here's an example config that i use for my reference data pipeline project:
https://github.com/fluxcapacitor/pipeline/blob/master/config/spark/spark-defaults.conf
i'm always playing with these values to simulate different conditions, but
that's the current snapshot that might be helpful.
also, don't forget about executor memory...
On Fri, Feb 12, 2016 at 1:40 PM, Silvio Fiorito <
silvio.fiorito@granturing.com> wrote:
> You’ll want to setup the FAIR scheduler as described here:
> https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application
>
> From: yael aharon <ya...@gmail.com>
> Date: Friday, February 12, 2016 at 2:00 PM
> To: "user@spark.apache.org" <us...@spark.apache.org>
> Subject: Allowing parallelism in spark local mode
>
> Hello,
> I have an application that receives requests over HTTP and uses spark in
> local mode to process the requests. Each request is running in its own
> thread.
> It seems that spark is queueing the jobs, processing them one at a time.
> When 2 requests arrive simultaneously, the processing time for each of them
> is almost doubled.
> I tried setting spark.default.parallelism, spark.executor.cores,
> spark.driver.cores but that did not change the time in a meaningful way.
>
> Am I missing something obvious?
> thanks, Yael
>
>
--
*Chris Fregly*
Principal Data Solutions Engineer
IBM Spark Technology Center, San Francisco, CA
http://spark.tc | http://advancedspark.com
Re: Allowing parallelism in spark local mode
Posted by Silvio Fiorito <si...@granturing.com>.
You’ll want to setup the FAIR scheduler as described here: https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application
From: yael aharon <ya...@gmail.com>>
Date: Friday, February 12, 2016 at 2:00 PM
To: "user@spark.apache.org<ma...@spark.apache.org>" <us...@spark.apache.org>>
Subject: Allowing parallelism in spark local mode
Hello,
I have an application that receives requests over HTTP and uses spark in local mode to process the requests. Each request is running in its own thread.
It seems that spark is queueing the jobs, processing them one at a time. When 2 requests arrive simultaneously, the processing time for each of them is almost doubled.
I tried setting spark.default.parallelism, spark.executor.cores, spark.driver.cores but that did not change the time in a meaningful way.
Am I missing something obvious?
thanks, Yael