You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Saiph Kappa <sa...@gmail.com> on 2016/02/21 18:31:03 UTC

Specify number of executors in standalone cluster mode

Hi,

I'm running a spark streaming application onto a spark cluster that spans 6
machines/workers. I'm using spark cluster standalone mode. Each machine has
8 cores. Is there any way to specify that I want to run my application on
all 6 machines and just use 2 cores on each machine?

Thanks

Re: Specify number of executors in standalone cluster mode

Posted by Hemant Bhanawat <he...@gmail.com>.
Max number of cores per executor can be controlled using
spark.executor.cores. And maximum number of executors on a single worker
can be determined by environment variable: SPARK_WORKER_INSTANCES.

However, to ensure that all available cores are used, you will have to take
care of how the stream is partitioned. Copy pasting help text of Spark.



*The number of tasks per receiver per batch will be approximately (batch
interval / block interval). For example, block interval of 200 ms will
create 10 tasks per 2 second batches. If the number of tasks is too low
(that is, less than the number of cores per machine), then it will be
inefficient as all available cores will not be used to process the data. To
increase the number of tasks for a given batch interval, reduce the block
interval. However, the recommended minimum value of block interval is about
50 ms, below which the task launching overheads may be a problem.An
alternative to receiving data with multiple input streams / receivers is to
explicitly repartition the input data stream (using
inputStream.repartition(<number of partitions>)). This distributes the
received batches of data across the specified number of machines in the
cluster before further processing.*

Hemant Bhanawat <https://www.linkedin.com/in/hemant-bhanawat-92a3811>
www.snappydata.io

On Sun, Feb 21, 2016 at 11:01 PM, Saiph Kappa <sa...@gmail.com> wrote:

> Hi,
>
> I'm running a spark streaming application onto a spark cluster that spans
> 6 machines/workers. I'm using spark cluster standalone mode. Each machine
> has 8 cores. Is there any way to specify that I want to run my application
> on all 6 machines and just use 2 cores on each machine?
>
> Thanks
>