You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by karthikjay <as...@gmail.com> on 2018/04/20 17:56:14 UTC
[Structured Streaming][Kafka] For a Kafka topic with 3 partitions,
how does the parallelism work ?
I have the following code to read data from Kafka topic using the structured
streaming. The topic has 3 partitions:
val spark = SparkSession
.builder
.appName("TestPartition")
.master("local[*]")
.getOrCreate()
import spark.implicits._
val dataFrame = spark
.readStream
.format("kafka")
.option("kafka.bootstrap.servers",
"1.2.3.184:9092,1.2.3.185:9092,1.2.3.186:9092")
.option("subscribe", "partition_test")
.option("failOnDataLoss", "false")
.load()
.selectExpr("CAST(value AS STRING)")
My understanding is that Spark will launch 3 Kafka consumers (for 3
partitions) and these 3 consumers will be running on the worker nodes. Is my
understanding right ?
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
Re: [Structured Streaming][Kafka] For a Kafka topic with 3
partitions, how does the parallelism work ?
Posted by Raghavendra Pandey <ra...@gmail.com>.
Yes as long as there are 3 cores available on your local machine.
On Fri, Apr 20, 2018 at 10:56 AM karthikjay <as...@gmail.com> wrote:
> I have the following code to read data from Kafka topic using the
> structured
> streaming. The topic has 3 partitions:
>
> val spark = SparkSession
> .builder
> .appName("TestPartition")
> .master("local[*]")
> .getOrCreate()
>
> import spark.implicits._
>
> val dataFrame = spark
> .readStream
> .format("kafka")
> .option("kafka.bootstrap.servers",
> "1.2.3.184:9092,1.2.3.185:9092,1.2.3.186:9092")
> .option("subscribe", "partition_test")
> .option("failOnDataLoss", "false")
> .load()
> .selectExpr("CAST(value AS STRING)")
>
> My understanding is that Spark will launch 3 Kafka consumers (for 3
> partitions) and these 3 consumers will be running on the worker nodes. Is
> my
> understanding right ?
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>