You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by map reduced <k3...@gmail.com> on 2016/11/03 04:48:33 UTC

Increasing Executor threadpool

Hi,

I am noticing that when there are N cores per executor, each executor only
starts N threads to process the data (so 1 thread per core). Is there a way
to increase more than N threads, i.e. say N+m threads per core?

So I assigned 7 cores/executor, so I see 7 Active Tasks at all times.

[image: Inline image 1]

And only 7 threads doing all the work:

[image: Inline image 2]

Is there any way to make it atleast 2 threads/core?

P.S.: Long running Streaming job, Standalone 2.0.0 cluster

Thanks,
KP

Re: Increasing Executor threadpool

Posted by map reduced <k3...@gmail.com>.
The reason I am asking this is because I am sending (a fraction of) these
messages processed to a HTTP endpoint which has avg latency of around
70-90ms - and I have blocking calls as of now - which only with 7-10
threads (for 7- 10 cores) is slowing down everything. What would you
suggest? Going async?

On Wed, Nov 2, 2016 at 9:48 PM, map reduced <k3...@gmail.com> wrote:

> Hi,
>
> I am noticing that when there are N cores per executor, each executor only
> starts N threads to process the data (so 1 thread per core). Is there a way
> to increase more than N threads, i.e. say N+m threads per core?
>
> So I assigned 7 cores/executor, so I see 7 Active Tasks at all times.
>
> [image: Inline image 1]
>
> And only 7 threads doing all the work:
>
> [image: Inline image 2]
>
> Is there any way to make it atleast 2 threads/core?
>
> P.S.: Long running Streaming job, Standalone 2.0.0 cluster
>
> Thanks,
> KP
>

Re: Increasing Executor threadpool

Posted by map reduced <k3...@gmail.com>.
Right, I understand that. I was just hoping to increase # of threads in
this threadpool
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L85
somehow.

On Thu, Nov 3, 2016 at 12:55 AM, Mich Talebzadeh <mi...@gmail.com>
wrote:

> The core is equivalent to the number of logical processeors
>
> cat /proc/cpuinfo|grep processor|wc -l
>
>
> That will tell you how many logical processors you have.
>
> I gave an explanation for this a while back. As you are running in
> Standalone mode, this is my take:
>
> Standalone mode
> Resources are managed by Spark resource manager itself. You start your
> master and slaves/worker processes As far as I have worked it out the
> following applies
>
> num-executors         --> It does not care about this. The number of
> executors will be the number of workers on each node
> executor-memory       --> If you have set up SPARK_WORKER_MEMORY in
> spark-env.sh, this will be the memory used by the executor
> executor-cores        --> If you have set up SPARK_WORKER_CORES in
> spark-env.sh, this will be the number of cores used by each executor
> SPARK_WORKER_CORES=n ##, total number of cores to be used by executors by
> each worker
> SPARK_WORKER_MEMORY=mg ##, to set how much total memory workers have to
> give executors (e.g. 1000m, 2g)
>
>
> HTH
>
>
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 3 November 2016 at 04:48, map reduced <k3...@gmail.com> wrote:
>
>> Hi,
>>
>> I am noticing that when there are N cores per executor, each executor
>> only starts N threads to process the data (so 1 thread per core). Is there
>> a way to increase more than N threads, i.e. say N+m threads per core?
>>
>> So I assigned 7 cores/executor, so I see 7 Active Tasks at all times.
>>
>> [image: Inline image 1]
>>
>> And only 7 threads doing all the work:
>>
>> [image: Inline image 2]
>>
>> Is there any way to make it atleast 2 threads/core?
>>
>> P.S.: Long running Streaming job, Standalone 2.0.0 cluster
>>
>> Thanks,
>> KP
>>
>
>

Re: Increasing Executor threadpool

Posted by Mich Talebzadeh <mi...@gmail.com>.
The core is equivalent to the number of logical processeors

cat /proc/cpuinfo|grep processor|wc -l


That will tell you how many logical processors you have.

I gave an explanation for this a while back. As you are running in
Standalone mode, this is my take:

Standalone mode
Resources are managed by Spark resource manager itself. You start your
master and slaves/worker processes As far as I have worked it out the
following applies

num-executors         --> It does not care about this. The number of
executors will be the number of workers on each node
executor-memory       --> If you have set up SPARK_WORKER_MEMORY in
spark-env.sh, this will be the memory used by the executor
executor-cores        --> If you have set up SPARK_WORKER_CORES in
spark-env.sh, this will be the number of cores used by each executor
SPARK_WORKER_CORES=n ##, total number of cores to be used by executors by
each worker
SPARK_WORKER_MEMORY=mg ##, to set how much total memory workers have to
give executors (e.g. 1000m, 2g)


HTH






Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 3 November 2016 at 04:48, map reduced <k3...@gmail.com> wrote:

> Hi,
>
> I am noticing that when there are N cores per executor, each executor only
> starts N threads to process the data (so 1 thread per core). Is there a way
> to increase more than N threads, i.e. say N+m threads per core?
>
> So I assigned 7 cores/executor, so I see 7 Active Tasks at all times.
>
> [image: Inline image 1]
>
> And only 7 threads doing all the work:
>
> [image: Inline image 2]
>
> Is there any way to make it atleast 2 threads/core?
>
> P.S.: Long running Streaming job, Standalone 2.0.0 cluster
>
> Thanks,
> KP
>