You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Guillermo Ortiz <ko...@gmail.com> on 2016/01/21 12:35:17 UTC

Number of executors in Spark - Kafka

I'm using Spark Streaming and Kafka with Direct Approach. I have created a
topic with 6 partitions so when I execute Spark there are six RDD. I
understand than ideally it should have six executors to process each one
one RDD. To do it, when I execute spark-submit (I use  YARN) I specific the
number executors to six.
If I don't specific anything it just create one executor. Looking for
information I have read:

"The --num-executors command-line flag or spark.executor.instances
configuration
property control the number of executors requested. Starting in CDH
5.4/Spark 1.3, you will be able to avoid setting this property by turning
on dynamic allocation
<https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation>
with
thespark.dynamicAllocation.enabled property. Dynamic allocation enables a
Spark application to request executors when there is a backlog of pending
tasks and free up executors when idle."

I have this parameter enabled, I understand than if I don't set the
parameter --num-executors it must create six executors or am I wrong?

Re: Number of executors in Spark - Kafka

Posted by Cody Koeninger <co...@koeninger.org>.
6 kafka partitions will result in 6 spark partitions, not 6 spark rdds.

The question of whether you will have a backlog isn't just a matter of
having 1 executor per partition.  If a single executor can process all of
the partitions fast enough to complete a batch in under the required time,
you won't have a backlog.

On Thu, Jan 21, 2016 at 5:35 AM, Guillermo Ortiz <ko...@gmail.com>
wrote:

>
> I'm using Spark Streaming and Kafka with Direct Approach. I have created a
> topic with 6 partitions so when I execute Spark there are six RDD. I
> understand than ideally it should have six executors to process each one
> one RDD. To do it, when I execute spark-submit (I use  YARN) I specific the
> number executors to six.
> If I don't specific anything it just create one executor. Looking for
> information I have read:
>
> "The --num-executors command-line flag or spark.executor.instances configuration
> property control the number of executors requested. Starting in CDH
> 5.4/Spark 1.3, you will be able to avoid setting this property by turning
> on dynamic allocation
> <https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation> with
> thespark.dynamicAllocation.enabled property. Dynamic allocation enables a
> Spark application to request executors when there is a backlog of pending
> tasks and free up executors when idle."
>
> I have this parameter enabled, I understand than if I don't set the
> parameter --num-executors it must create six executors or am I wrong?
>