You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Amit Sharma <re...@gmail.com> on 2019/07/25 12:23:51 UTC

Core allocation is scattered

I have cluster with 26 nodes having 16 cores on each. I am running a spark
job with 20 cores but i did not understand why my application get 1-2 cores
on couple of machines why not it just run on two nodes like node1=16 cores
and node 2=4 cores . but cores are allocated like node1=2 node
=1---------node 14=1 like that. Is there any conf property i need to
change. I know with dynamic allocation we can use below but without dynamic
allocation is there any?
--conf "spark.dynamicAllocation.maxExecutors=2"


Thanks
Amit

Re: Core allocation is scattered

Posted by 15313776907 <15...@163.com>.
This may be within your yarn constraints, but you can look at the configuration parameters of your yarn


On 7/25/2019 20:23,Amit Sharma<re...@gmail.com> wrote:
I have cluster with 26 nodes having 16 cores on each. I am running a spark job with 20 cores but i did not understand why my application get 1-2 cores on couple of machines why not it just run on two nodes like node1=16 cores and node 2=4 cores . but cores are allocated like node1=2 node =1---------node 14=1 like that. Is there any conf property i need to change. I know with dynamic allocation we can use below but without dynamic allocation is there any?
--conf "spark.dynamicAllocation.maxExecutors=2"





Thanks
Amit

Re: Core allocation is scattered

Posted by Muthu Jayakumar <ba...@gmail.com>.
>I am running a spark job with 20 cores but i did not understand why my
application get 1-2 cores on couple of machines why not it just run on two
nodes like node1=16 cores and node 2=4 cores . but cores are allocated like
node1=2 node =1---------node 14=1 like that.

I believe that's the intended behavior for spark. Please refer to
https://spark.apache.org/docs/latest/spark-standalone.html#cluster-launch-scripts
section on 'spark.deploy.spreadOut' mode.If I understand correctly, you may
want " spark.deploy.spreadOut  false".

Hope it helps!

Happy Spark(ing).

On Thu, Jul 25, 2019 at 7:22 PM Srikanth Sriram <
sriramsrikanth1985@gmail.com> wrote:

> Hello,
>
> Below is my understanding.
>
> The default configuration parameters which will be considered by the spark
> job if these are not configured at the time of submitting job to the
> required values.
>
> # - SPARK_EXECUTOR_INSTANCES, Number of workers to start (Default: 2)
> # - SPARK_EXECUTOR_CORES, Number of cores for the workers (Default: 1).
> # - SPARK_EXECUTOR_MEMORY, Memory per Worker (e.g. 1000M, 2G) (Default: 1G)
>
> SPARK_EXECUTOR_INSTANCES -> indicates the number of workers to be started,
> it means for a job maximum this many number of executors it can ask/take
> from the cluster resource manager.
>
> SPARK_EXECUTOR_CORES -> indicates the number of cores in each executor, it
> means the spark TaskScheduler will ask this many cores to be
> allocated/blocked in each of the executor machine.
>
> SPARK_EXECUTOR_MEMORY -> indicates the maximum amount of RAM/MEMORY it
> requires in each executor.
>
> All these details are asked by the TastScheduler to the cluster manager
> (it may be a spark standalone, yarn, mesos and can be kubernetes supported
> starting from spark 2.0) to provide before actually the job execution
> starts.
>
> Also, please note that, initial number of executor instances is dependent
> on "--num-executors" but when the data is more to be processed and
> "spark.dynamicAllocation.enabled" set true, then it will be dynamically add
> more executors based on "spark.dynamicAllocation.initialExecutors".
>
> Note: Always "spark.dynamicAllocation.initialExecutors" should be
> configured greater than "--num-executors".
> spark.dynamicAllocation.initialExecutors
> spark.dynamicAllocation.minExecutors Initial number of executors to run
> if dynamic allocation is enabled.
>
> If `--num-executors` (or `spark.executor.instances`) is set and larger
> than this value, it will be used as the initial number of executors.
> spark.executor.memory 1g Amount of memory to use per executor process, in
> the same format as JVM memory strings with a size unit suffix ("k", "m",
> "g" or "t") (e.g. 512m, 2g).
> spark.executor.cores 1 in YARN mode, all the available cores on the
> worker in standalone and Mesos coarse-grained modes. The number of cores
> to use on each executor. In standalone and Mesos coarse-grained modes, for
> more detail, see this description
> <http://spark.apache.org/docs/latest/spark-standalone.html#Executors%20Scheduling>
> .
>
> On Thu, Jul 25, 2019 at 5:54 PM Amit Sharma <re...@gmail.com> wrote:
>
>> I have cluster with 26 nodes having 16 cores on each. I am running a
>> spark job with 20 cores but i did not understand why my application get 1-2
>> cores on couple of machines why not it just run on two nodes like node1=16
>> cores and node 2=4 cores . but cores are allocated like node1=2 node
>> =1---------node 14=1 like that. Is there any conf property i need to
>> change. I know with dynamic allocation we can use below but without dynamic
>> allocation is there any?
>> --conf "spark.dynamicAllocation.maxExecutors=2"
>>
>>
>> Thanks
>> Amit
>>
>
>
> --
> Regards,
> Srikanth Sriram
>

Re: Core allocation is scattered

Posted by Srikanth Sriram <sr...@gmail.com>.
Hello,

Below is my understanding.

The default configuration parameters which will be considered by the spark
job if these are not configured at the time of submitting job to the
required values.

# - SPARK_EXECUTOR_INSTANCES, Number of workers to start (Default: 2)
# - SPARK_EXECUTOR_CORES, Number of cores for the workers (Default: 1).
# - SPARK_EXECUTOR_MEMORY, Memory per Worker (e.g. 1000M, 2G) (Default: 1G)

SPARK_EXECUTOR_INSTANCES -> indicates the number of workers to be started,
it means for a job maximum this many number of executors it can ask/take
from the cluster resource manager.

SPARK_EXECUTOR_CORES -> indicates the number of cores in each executor, it
means the spark TaskScheduler will ask this many cores to be
allocated/blocked in each of the executor machine.

SPARK_EXECUTOR_MEMORY -> indicates the maximum amount of RAM/MEMORY it
requires in each executor.

All these details are asked by the TastScheduler to the cluster manager (it
may be a spark standalone, yarn, mesos and can be kubernetes supported
starting from spark 2.0) to provide before actually the job execution
starts.

Also, please note that, initial number of executor instances is dependent
on "--num-executors" but when the data is more to be processed and
"spark.dynamicAllocation.enabled" set true, then it will be dynamically add
more executors based on "spark.dynamicAllocation.initialExecutors".

Note: Always "spark.dynamicAllocation.initialExecutors" should be
configured greater than "--num-executors".
spark.dynamicAllocation.initialExecutors
spark.dynamicAllocation.minExecutors Initial number of executors to run if
dynamic allocation is enabled.

If `--num-executors` (or `spark.executor.instances`) is set and larger than
this value, it will be used as the initial number of executors.
spark.executor.memory 1g Amount of memory to use per executor process, in
the same format as JVM memory strings with a size unit suffix ("k", "m",
"g" or "t") (e.g. 512m, 2g).
spark.executor.cores 1 in YARN mode, all the available cores on the worker
in standalone and Mesos coarse-grained modes. The number of cores to use on
each executor. In standalone and Mesos coarse-grained modes, for more
detail, see this description
<http://spark.apache.org/docs/latest/spark-standalone.html#Executors%20Scheduling>
.

On Thu, Jul 25, 2019 at 5:54 PM Amit Sharma <re...@gmail.com> wrote:

> I have cluster with 26 nodes having 16 cores on each. I am running a spark
> job with 20 cores but i did not understand why my application get 1-2 cores
> on couple of machines why not it just run on two nodes like node1=16 cores
> and node 2=4 cores . but cores are allocated like node1=2 node
> =1---------node 14=1 like that. Is there any conf property i need to
> change. I know with dynamic allocation we can use below but without dynamic
> allocation is there any?
> --conf "spark.dynamicAllocation.maxExecutors=2"
>
>
> Thanks
> Amit
>


-- 
Regards,
Srikanth Sriram