You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by satishl <sa...@gmail.com> on 2017/02/22 05:37:27 UTC

Spark executors in streaming app always uses 2 executors

I am reading from a kafka topic which has 8 partitions. My spark app is given
40 executors (1 core per executor). After reading the data, I repartition
the dstream by 500, map it and save it to cassandra.
However, I see that only 2 executors are being used per batch. even though I
see 500 tasks for the stage all of them are sequentially scheduled on the 2
executors picked. My spark concepts are still forming and I missing
something obvious.
I expected that 8 executors will be picked for reading data from the 8
partitions in kafka, and then with the repartition this data will be
distributed between 40 executors and then saved to cassandra.
How should I think about this?



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-executors-in-streaming-app-always-uses-2-executors-tp28413.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Spark executors in streaming app always uses 2 executors

Posted by Jon Gregg <co...@gmail.com>.

Spark offers a receiver-based approach or direct approach with Kafka (
https://spark.apache.org/docs/2.1.0/streaming-kafka-0-8-integration.html),
and a note in the receiver-based approach says "topic partitions in Kafka
does correlate to partitions of RDDs generated in Spark Streaming."

A fix might be as simple as switching to the direct approach
<https://spark.apache.org/docs/2.1.0/streaming-kafka-0-8-integration.html#approach-2-direct-approach-no-receivers>
?

Jon Gregg

On Wed, Feb 22, 2017 at 12:37 AM, satishl <sa...@gmail.com> wrote:

> I am reading from a kafka topic which has 8 partitions. My spark app is
> given
> 40 executors (1 core per executor). After reading the data, I repartition
> the dstream by 500, map it and save it to cassandra.
> However, I see that only 2 executors are being used per batch. even though
> I
> see 500 tasks for the stage all of them are sequentially scheduled on the 2
> executors picked. My spark concepts are still forming and I missing
> something obvious.
> I expected that 8 executors will be picked for reading data from the 8
> partitions in kafka, and then with the repartition this data will be
> distributed between 40 executors and then saved to cassandra.
> How should I think about this?
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Spark-executors-in-streaming-app-
> always-uses-2-executors-tp28413.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>